Author 
Thread 


tthsqe
Joined: 20 May 2009
Posts: 713

why is 2*2.5 not the same as 5.0 in fasmg?
code:
Code:  t = trunc (5.0)
display '0' + t, 10
t = trunc (2*2.499999)
display '0' + t, 10
t = trunc (2*2.5)
display '0' + t, 10
t = trunc (2*2.500001)
display '0' + t, 10
t = trunc (5.0)
display '0'  t, 10
t = trunc (2*2.499999)
display '0'  t, 10
t = trunc (2*2.5)
display '0'  t, 10
t = trunc (2*2.500001)
display '0'  t, 10


output:

29 Aug 2017, 15:05 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

There were some mistakes in the implementation of rounding. Please try version "hwq8c".

29 Aug 2017, 16:14 

tthsqe
Joined: 20 May 2009
Posts: 713

Your version hwq8c does 2*2.5=5.0 but there are still some issues.
Code:  ; when b/a is exactly representable by the float data
; type, I don't think we should get an error here
a = 3
b = 6
if (float a)*((float b)/(float a)) = float b
else
err 'oops'
end if 

So I traced through your code. Correct me if I am wrong on the following points:
(1) floats are represented by a 32 bit exponent and a 32n bit mantissa.
(2) the highest bit in the mantissa of a nonzero float is always 1
(3) a*b is calculated by calculating 64n bits but doing rounding on only 32n + 32 bits.
(4) a/b is calculated as a*(1/b)
I like (1) and especially (2). GMP does not do (2).
I see problems with (3) and (4) with (4) being the most severe.
Here is the problem with (4)
Code:  1/3 = 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa * 2^129
6 = 0xc0000000000000000000000000000000 * 2^125
6*1/3 = 0x7fffffffffffffffffffffffffffffff80000000000000000000000000000000 * 2^254
this last number is going to get shifted to
0xffffffffffffffffffffffffffffffff * 2^127 

As for (3), why not do the rounding on all 64n bits? The current
code will not round correctly when the bits of lesser significance than
mantissa_tail affect the rounding. I am fairly certain this set of cases
is nonempty.
If you want to fix (4), you are probably going to have to do the gradeschool
algorithm. (I went to grade school but didn't learn this algorithm properly there)
I have some notes on how to do this properly and I could share them with you. This
would save you the headache if you don't have this working already. Because of (2),
this is going to be slightly easier in fasmg than it was for me.
This may sound pedantic, but I would surely prize accuracy from the assembler
over speed.

30 Aug 2017, 20:25 

tthsqe
Joined: 20 May 2009
Posts: 713

Do you allow others to submit patches to fasmg?

30 Aug 2017, 20:31 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

tthsqe wrote:  Correct me if I am wrong on the following points:
(1) floats are represented by a 32 bit exponent and a 32n bit mantissa.
(2) the highest bit in the mantissa of a nonzero float is always 1
(3) a*b is calculated by calculating 64n bits but doing rounding on only 32n + 32 bits.
(4) a/b is calculated as a*(1/b) 

You are correct. You must have read my code in depth, I feel this is something that does not happen frequently.
tthsqe wrote:  As for (3), why not do the rounding on all 64n bits? 

You're right, it would be correct to use all bits, and it is quite easy to fix. It costs next to nothing while may improve precision in some cases, even if very rare.
tthsqe wrote:  If you want to fix (4), you are probably going to have to do the gradeschool algorithm. (I went to grade school but didn't learn this algorithm properly there)
I have some notes on how to do this properly and I could share them with you. This
would save you the headache if you don't have this working already. Because of (2),
this is going to be slightly easier in fasmg than it was for me. 

I'm not sure if this one is worth the trouble. But you could prove me wrong.

30 Aug 2017, 22:18 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

The problem here is: if I start making little tweaks like that, it would actually be better to rewrite the entire module with this different goal in mind. In its current form it is based on a premise similar to the one used by fasm 1  that the precision problems are "buffered" by maintaining more bits of the mantissa that required for the intended applications. In fasm 1 you could not convert float into a value larger that 80bit, but internally it was computing more bits, just so it could be little reckless with precision problems, knowing that they are going to be cushioned by the additional bits. In fasmg this "guaranteed" (as stated by the manual) precision is the 128bit representation. But, since fasmg also allows to do some basic arithmetic operations on these values, it is not surprising that one could expect more rigorous approach. The alternative I had in mind was a kind of "adaptive precision" floating point, where the size of mantissa could grow indefinitely if only it was possible to keep the exact result. I guess that variant would be more satisfying to you, but I dropped it thinking that it is not really necessary for an assembler. On the other hand, if it was available, it could probably create some new possible uses of fasmg.
tthsqe wrote:  Do you allow others to submit patches to fasmg? 

I would accept them as long as they would fit my vision. As it was in case of fasm 1 development, I usually end up rewriting any submissions my own way. But in case of fasm 1 a frequent problem was that others could not take into consideration all possible interactions they additions could cause, because code of fasm 1 was quite cryptic. I hope that with fasmg the situation is bit better, though the problems of this kind are still possible.

31 Aug 2017, 15:51 

tthsqe
Joined: 20 May 2009
Posts: 713

By allowing floating point operations into fasmg you have opened a new can of worms. As I see it, the floating point operations should be reliable since they are now a part of the language. I will submit to you a reworked floating point module and see if you like it. It will take a while but will have exact rounding (with ties going to even) for +,,*, and /.

31 Aug 2017, 16:14 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

What is your opinion on adaptive precision then? If I'd go into direction of better precision, I'd prefer to go all the way.
Not anytime soon, though. When I had the time to work on the floats module, I still dismissed the idea as too complex, and just made one that I deemed enough for the general assembly needs. The are so many other interesting things to do with fasmg...

31 Aug 2017, 17:59 

tthsqe
Joined: 20 May 2009
Posts: 713

By adaptive precision i hope you mean arbitrary precision, in which case the complicated question of how to request the desired precision needs to be adressed. You could have another keyword used like "setfloatprec n" in which case all subsequent operations would be performed with at least n bits. Of course the floats stored in memory at any point in time could have many different precisions, but the calculations would alway product floats with the last set value of n. An asdembler with arbitary precision floats as well as ints would be nice, though the latter can be used to simulate the former.

31 Aug 2017, 18:46 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

No, what I meant by "adaptive precision" is a bit more specific. Not only the precision would be adjustable, but in case of operations that could give exact result, the exact result would be produced even if it required precision to be increased. For example multiplication of two floats with precision n would produce a number with precision up to 2n (might be smaller if mantissa ends with a bunch of zeros). And when adding a very small number to a very large one, precision would have to grow tremendously to contain the exact result.
In fact, my current implementation could be extended to become what I describe above, because at the time I was writing it I still had the idea in mind, even if I considered it impractical.
tthsqe wrote:  An asdembler with arbitary precision floats as well as ints would be nice, though the latter can be used to simulate the former. 

This actually could be an argument for leaving the floats implementation simple and directing anyone wanting more to use the integer arithmetic to "simulate" arbitrary precision floats.

31 Aug 2017, 18:52 

tthsqe
Joined: 20 May 2009
Posts: 713

So you are essentially talking about arithmetic over Q since correct handling of 1 + 2^1000 while choaking on 2/3 would be considered strange behaviour by almost any user. Even with such adaptive capabilities, there would still have to be a limit on the sizes of the numerators and denominators, otherwise calculations can expload quickly. So when a rational calculation goes over the limit, you would have to choose a smaller denominator.

31 Aug 2017, 19:59 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

tthsqe wrote:  So you are essentially talking about arithmetic over Q since correct handling of 1 + 2^1000 while choaking on 2/3 would be considered strange behaviour by almost any user. 

fasm was never afraid of making unusual choices. Though usually they have to make some practical sense nonetheless.
Speaking of strange choices, for arithmetic over Q the 2adic approach would be much more in my spirit. This is in fact something I worked on for my PhD thesis, which I sadly never finished.

31 Aug 2017, 20:09 

tthsqe
Joined: 20 May 2009
Posts: 713

Ok. So fasmg should have an interesting floating point format. It doesnt need to be fast since anyone desiring speed can simulate fixed points with ints. Since I am not familiar with completions as much as I would like to be, I would go with 2^e × a / b where a and b are odd. This isnt that interesting, but its light years adead of the boring 1.blahblah × 2^e format, which cant represent most interesting numbers. If log a + log b ever gets too big, you would choose the last convergent in the CF partial quotients that fits. Good idea?

31 Aug 2017, 20:29 

tthsqe
Joined: 20 May 2009
Posts: 713

Thomasz,
The realisation that you must have long division because you have "/" and "mod" for big ints took be to line 3000 or so for expressions.inc. I think the reason people frequently don't read your code is that it is close to inscrutable.
For multiplication by more than one segment, I see a two way split of the input terms and then a three point evaluation. However, I can't follow your long division algorithm. Can you follow mine? https://github.com/tthsqe12/cas/blob/master/idiv.cpp#L78 (most of the functions used in there are self evident, and _estq is defined at the beginning of the file)
Could you give some hints on how your division works? Does it produce the quotient segments in order from high to low? What is the complexity?

05 Sep 2017, 12:41 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

tthsqe wrote:  Could you give some hints on how your division works? Does it produce the quotient segments in order from high to low? What is the complexity? 

This is a revised version of the algorithm used by fasm 1 for division by 64bit number, it computes iterated approximations until remainder is smaller than the divisor. The first approximation is made with baseline division by 32bit value (made from the most significant bits of the divisor). I chose this algorithm because it has very simple implementation, while still being reasonably fast. In case of fasmg its simplicity may be overshadowed by the complexity of how the expression term data structures need to be handled, but with any more complex division algorithm it would become even worse. The complexity appears quadratic, though I never did an exact analysis.

05 Sep 2017, 13:21 

tthsqe
Joined: 20 May 2009
Posts: 713

Quote:  This is a revised version of the algorithm used by fasm 1 for division by 64bit number, it computes iterated approximations until remainder is smaller than the divisor. The first approximation is made with baseline division by 32bit value (made from the most significant bits of the divisor). 

So it seems you have never implemented a proper grade school division algorithm! I reworked the arbitrary precision floats in my cas project so that rounding is exact and to the nearest even (this of course then requires that the highest bit in the mantissa is set, like in your implementation). The result is clean simple c code that I hope you can translate into the fasmg core. If I get it thoroughly tested, would you be willing to take some time to translate the code into fasmg?

06 Sep 2017, 20:49 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

tthsqe wrote:  So it seems you have never implemented a proper grade school division algorithm! 

As I explained in the other thread, I did look for as simple and short algorithm as possible, because I did not consider its performance crucial in case of fasmg, the bottlenecks of complex assembly lay completely elsewhere. And with such approach I simply aim at a tiny code (see my remark about HeavyThing), so that at least it does not pollute cache too much.
tthsqe wrote:  I reworked the arbitrary precision floats in my cas project so that rounding is exact and to the nearest even (this of course then requires that the highest bit in the mantissa is set, like in your implementation). The result is clean simple c code that I hope you can translate into the fasmg core. If I get it thoroughly tested, would you be willing to take some time to translate the code into fasmg? 

In my case translating the code would probably end up changing to simply writing my own variant in assembly. It is not that I did not see how to make the floats more robust when I was implementing them, I simply chose not to, considering the possible applications and the time needed (and also see above). You brought a good point that the new uses may arise simply because these features are there now, and for this reason improving the floats could actually bring some value, but that remains to be seen.

06 Sep 2017, 21:43 

tthsqe
Joined: 20 May 2009
Posts: 713

Alight. I hope you can see the value of exact rounding and that we can see it in fasmg sometime (or some other interesting float format). As for code size and temporary workspace, division and multiplication are about the same for the former and equal for the latter.

06 Sep 2017, 23:59 

Tomasz Grysztar
Assembly Artist
Joined: 16 Jun 2003
Posts: 6824
Location: Kraków, Poland

tthsqe wrote:  As for code size and temporary workspace, division and multiplication are about the same for the former and equal for the latter. 

So I implemented neither.
One might think that Karatsuba multiplication would end up taking more space in executable, but because the code consist of multiple calls to functions that are available anyway, it ended up being quite concise and I was happy with it. A similar thing happened with the division.
And, if you look closer, what this algorithm actually achieves is not that much different from the classic long division. On average every iteration refines about 32 bits (so one digit, if we measure in machine words) of the result, starting from the top (as this is just repeatedly refined approximation).

07 Sep 2017, 15:05 


