flat assembler
Message board for the users of flat assembler.

Index > Heap > IEEE floating point format is silly

Author
Thread Post new topic Reply to topic
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
I never really understood this format --> why does it need a 128-exponent?? When the mantissa can only hold 16777215 values? This is ridicuously huge!!

And for that fact, they claim it has "limited precision". Of course, it would be limited with a 32-exponent (or 16-exponent) too, but at least much better. It's so limited because we waste it... that's like most things out there -- they always claim memory is limited, when in fact they waste it (like most textures, even a simple average compression would render it about 50% smaller, especially because of all those silly 0s in paks or other files.. where do they come from? my only answer would be wasting it up... and besides, this kind-of simple compression would be faster, RAM and especially hard drives are slow for large files/etc.. and a simple RLE is always almost faster!).

Back to FPU -- okay, having said 32-bit is designed to waste precision, let's go to 64-bit.. This is MUCH sillier, 1024-exponent.. When will you need 1024 binary digits to the left/right? This is extremely silly from my point of view. Let's not get into the double-precision 80-bit...

Sorry if I sound wierd, I just don't know why it was designed so silly.. none of the programs I saw ever needed that kind of exponents, but a 26 or 27 bit mantissa would be better for precision. I see this format as kind-of wasting format -- that's why I prefer fixed-arithmetic. Not because it is with integers, but because I design the format myself, and the IEEE format really sucks. It wastes so many bits when it could've employed better precision.

Some of you would say FPU is fast with SSE and so.. but who stops Intel/AMD/etc doing the same with integers (better MMX and instructions). Integers will always be faster, but lack of implementation is probably what makes such false statements as "fpu is faster". By it's nature, it is NOT. It's the same as if you say: "80386 has protection, and even so it's FASTER than 8086, but that doesn't mean protection is faster than lack of protection." If Intel designed the 8086 with the same clock speed as 80386 (and caches/pipelines, etc.) then it would have been faster than that protected-386. So protection is really slower! Only that Intel did not take the time to improve the 8086. This was only an example (fpu was kind-of "the protection" there).
Post 30 May 2006, 18:39
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17251
Location: In your JS exploiting you and your system
revolution
Quote:
When will you need 1024 binary digits to the left/right?
Sometimes intermediate values can get quite large. I noticed it especially when I was doing ray tracing graphics, the scaleof things could get rather mind boggling at times, I would see errors from calculations in the pictures if I used only single precision.

What size exponent field do you think would be suitable?

It is mostly a trade-off with what programs will most likely need. Some require more precision, others more scale, sometimes more of both. It all depends on your application.
Post 30 May 2006, 19:31
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Most programs use small-scale, as it has more precision (and even those that appear to use "large-scale" may still convert it, I've seen a lot).

And with large-scale, for example, 32-bit precision can't use the value 16777217... it simply cannot represent it, and it uses the 16777218 value. Less of a chance to use fractional values.

I think a suitable exponent for single-precision (and perhaps double-precision too) would be about 32 (or 16, I dunno) to the left/right. 32 seems might be a bit much (considering that even the largest exponent SHOULD still represent fractional parts, no?). But LARGE values in floating arithmethic are very BAD -- not only they will be unable to represent fractional values, they will also be unable to represent ALL the integer values (my example with 16777217).

With 16-bit exponent to the left you will only have "65536" as the max "integer" part, but you will still have "fractional part". It's better if it scales down rather than up in FP aritmethic.

Having a sufficiently large exponent means you won't be able to represent fractional values, and even not all integer values, that's why I don't see a point (it's better to use integers there, no?).


128 exponent is nothing compared to 1024 binary digits.. Imagine how much precision and integer values you'll lose! There's no need for such large exponents. Instead, the mantissa should be more precise (more bits).
Post 30 May 2006, 19:54
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Am I the only crazy one to notice this? Smile
Post 30 May 2006, 21:02
View user's profile Send private message Reply with quote
Tomasz Grysztar
Assembly Artist


Joined: 16 Jun 2003
Posts: 7715
Location: Kraków, Poland
Tomasz Grysztar
I think you miss the point of floating point formats.
Post 30 May 2006, 22:18
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
I know, there's gotta be something here, but I just don't get it. Question
Can you explain it?

thanks Wink
Post 30 May 2006, 22:22
View user's profile Send private message Reply with quote
donkey7



Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska
donkey7
grey beast:
i think that your algos are silly, not the ieee format itself. exponent sizes are optimal for me. btw: nothing will be suitable for all, i mean: somebody want bigger exponents, and somebody wants bigger mantissa.

i think the only disadvantages of ieee format is presence of nans and denormalised numbers. if i design the format it will look something like this:
1000000...0000 - minus zero
0000000...0000 - plus zero
1111111...1111 - minus infinity
0111111...1111 - plus infinity

and all other combinations will be normal numbers (with one hidden bit).

the problem with floats is that mathematical equations are not true in fpu world. for example a/c*c is not equal to a, as we know from mathematic. so we must use special treatment of floats.

imagine we have float with 15 bit mantissa and we want to sum up one value of 1.000.000 and million values of 1. we can do this in at least two ways:
a) 1.000.000+ 1+ 1+ 1+ 1+ .....
b) 1+ 1+ 1+ ... + 1.000.000

let trace both examples:
a) 1.000.000+ 1 = 1.000.000 (because we have only 15.bits precision)
1.000.000 + 1 = 1.000.000
... 999.997 iterations as above...
and finally 1.000.000 + 1= 1.000.000
we see that our result is far from correct Smile
b) 1+ 1 = 2
2+ 1 = 3
...999.997 iter...
and finally 1.000.000+ 1.000.000 = 2.000.000
in fact it will differ somewhat, say it will result in 2.000.135 but as we see the result is much more correct!

so we see that order of calculations is much more important that the mantissa size!!!

after thinking a bit i invented some methods to preserve most accuracy in some cases:
a) we have to sum up one million random floats. we can sum numbers from left to right but it's very inaccurate (say we have 123523523 + 0.0000235 the result will be 123523523, so we have a big lose of accuracy). much better strategy is to sum up two numers with smallest exponents and replace them with result and repat those steps until we have final result. we can use ie. heap to do this efficiently.

pseudocode:
1. insert all values into heap
2. a = extract-min;
3. b = extract-min;
4. c = a + b;
5. insert c into heap;
6. if more than two alues in heap then goto 2
7. return extract-min

and we have the most accurate value we can achieve.

ps. extract-min resturns value with smallest exponent (smallest absolute value).

b) subtraction is analogic and you can even mix subtractions and addiotions because we don't look at sign when doing extract-min (it relies on exponent, not on sign).

c) multiplication. it's a bit more complicated. instead of exponent we must compute rcz (rightmost consecutive zeroes). bsr can do this. algo is the same as in addition but extract-min returns values with biggest rcz.

d) for division you must use simple trick. consider equation:
a = b/ c/ d/ e/ f
it can be rewritten in this way:
a = b/ (c* d* e* f)
and you do multiplications first and then one divsion


those are just my few thoughts but remeber: firstly you must improve your algorithm *then* implementation!!!

edit:
rcz = rightmost consecutive zeroes in matissa of course Smile
Post 31 May 2006, 15:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17251
Location: In your JS exploiting you and your system
revolution
NaN's are useful for tracking errors/overflow/underflow without having to check every intermediate result.
Post 31 May 2006, 16:34
View user's profile Send private message Visit poster's website Reply with quote
donkey7



Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska
donkey7
but you have flags. if fpu is done in normal way, i mean using registers and flags, not silly stack, then nans will be useless. and afair nans are handled much slowly.
Post 31 May 2006, 17:34
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Quote:

and afair nans are handled much slowly.

Is that a problem, though? NaN's aren't supposed to show up very often.

And doesn't the FPU support throwing exceptions? Dunno if SSE does as well... been a while since I digged into those parts of the books Smile
Post 31 May 2006, 17:58
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17251
Location: In your JS exploiting you and your system
revolution
Quote:
but you have flags
The flags only show if the whole calculation went wrong or right, but there is no way to trace the memory to see where the last good value ends and the next bad values begin. Checking the flags after each operation is inefficient. So the NaN's are an important feature. Also, they take up very little room in the number space.
Post 01 Jun 2006, 03:04
View user's profile Send private message Visit poster's website Reply with quote
donkey7



Joined: 31 Jan 2005
Posts: 127
Location: Poland, Malopolska
donkey7
Quote:

Checking the flags after each operation is inefficient.

because of fpu implementation. with cpu you haven't such big penalty on time. if intel design fpu like cpu (registers instead of stack) then it will be as efficient as cpu.
Post 01 Jun 2006, 07:59
View user's profile Send private message Visit poster's website Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Quote:

but lack of implementation is probably what makes such false statements as "fpu is faster". By it's nature, it is NOT.

Historically, i.e. 20 years ago, the fpu resided on a separate chip. Designed by a different team, from the architects of the cpu, it was irritating to learn that stack manipulation, rather than register useage was required. Eventually, around 1989, this conceptually archaic fpu was integrated with the cpu on a single chip (486), and it is from this date that the myth of the INCREASED execution speed of the fpu was born. In fact, a given algorithm could have been executed somewhat faster, if a portion of the task could have been executed by the fpu CONCURRENTLY with the portion of the task executed by the cpu, thereby reducing the overall task time. I attempted, UNSUCCESSFULLY, in 1990, to implement such a scenario with the in-line computation of a fast fourier transform, executing part of the algorithm with the integer unit, and the other half with the floating point processor. Overhead managing the two parts was significant, so little or no gain was achieved by me. Since 1993, when I abandoned assembly language programming, cpu architecture has made such dramatic improvements, it now seems, in retrospect, that my thinking at the time represented a completely futile exercise. I imagined that it should be possible to harness two oxen to plow the field in half the time....Today we have dual integer execution units on a single chip, together with fpu! Wow. Employing a split radix, the overhead may now be less onerous, assigning, for example, the radix 2 code to fpu, the radix 4 code to the second cpu, and the overhead, assignment, recombination, to the first cpu. Wow. Think of the time one could waste working on such a problem! Desktop cpu's today are so fast, compared with "supercomputers" of twenty years ago, the cpu could compute a DFT, let alone FFT, in milliseconds, obviating need for further speed enhancements. Additional advantage: contemporary cpu's generate so much heat that one can literally use them to heat a room. I look forward to that day when some company comes along with a MUCH SIMPLER, smaller, slower cpu, without cache, without fpu, without branch prediction, and with a much less sophisticated instruction set, so that mundane tasks, like internet browsing can be accomplished with ease, at much lower cost. I wonder what kind of efficiency one could gain from a cluster of four integer units on a single chip, having NO cache, with all data stored in main memory (DDR2 dual channel 667, wow) to reduce overhead needed to determine coherency?
Post 01 Jun 2006, 10:33
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
tom tobias: tried RISCs? i am now exploring ARM processors, and they seem simple but powerful - like fasm
Post 01 Jun 2006, 12:30
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
donkey7: I don't know why I just don't get it.. sorry again to bother

I don't see, when you have a (let's say 53 bit mantissa for the double precision) why do you need 1024 exponent? Exponents larger than 53 will result in integers -- i.e unable to represent fractions. From this point, I see no advantage for floats over ints. And plus, an even larger exponent will result in "skipping some integer values", so it is even worse than integers! That's why I don't get it.

And this is only exponent 56 or 57... imagine exponent 1024.. you will have approximately 1024 binary digits, with approx 971 0s at the end. Imagine how much accuracy you lose, and what a big number you have. There's no need for it. Just think a bit. If the exponent has a value larger than mantissa's bits, then you will render the value unable to represent fractional values. An even larger exponent will skip some integer values (i.e worse than integers).

Say you have 64-bit mantissa. A 16383 exponent means... Shocked Shocked Shocked

With exponent 64 there, you'll have NO fractional bits (all of the mantissa belongs to the integer part). Now, imagine a 16383 exponent, which will add approximately 16319 0s Shocked such a large number, but such LOW precision... All those 0s are unable to be 1s, because they belong to the exponent, not mantissa.

That's why I don't get it. In my opinion, the exponent should represent at most the mantissa's size. For example, if you have a 53-bit mantissa, a suitable exponent (in my opinion) would be 64-exponent (6 bits) or 32-exponent (5 bits). Why 32? Because I think that you should still be able to use fractional values when you have the maximum value in the exponent, no?

If you have a 53 value in the exponent, you will have NO fractional part. And a 63 exponent (maximum) Thus, that's why a 64-exponent might be a bit much. But I think I don't get it... Confused


I don't see a point in such large exponents (16383 Shocked). When are they used? Or is there another trick in the middle?

I know something about dynamic-range but I don't get it. And why does it need to be such large? After all, you lose precision exponentially and proportionally to the number's size. I.e the larger the number, the much more precision you lose. I find it silly because I don't see the point for such large numbers, it's way off in my opinion. Anyone can explain this? I just don't get it
Post 01 Jun 2006, 16:23
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17251
Location: In your JS exploiting you and your system
revolution
Avogadro's number (6.022136736*10^23) is known to about 10 digits accuracy, there is no need to list/store extra zero digits at the end because they are unknown. Should I write this?
Code:
602213673600000000000000    
That is unnecessary and also inaccurate because the zero's at the end are not known values. The same with FP formats, large/small numbers don't require us to store all the extra digits because we don't need them and also they are unknown anyway.

We can still do 7 digit accuracy calculations with large numbers like that using only single precision. eg. multiply Avogadro's number by weight in grams and divide by the atomic weight to calculate the number of atoms of a substance. You would need at least 79 binary digit integers to do that calculation and you might lose a few decimal points also if the number of grams is a very small number. So we move the point around (floating point see) to suit the required scale of the number without sacrificing our accuracy. Note the difference here between accuracy and precision, they are not the same thing.

Doing counting by repeatedly adding 1 to a number is best suited to using integers, try getting your computer to count a 64 bit number one number at a time, you will be waiting a very long time for the result! Usually FP is not used for such a situation. Like I said above, it all depends on your application.
Post 01 Jun 2006, 16:57
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
revolution wrote:
We can still do 7 digit accuracy calculations with large numbers like that using only single precision. eg. multiply Avogadro's number by weight in grams and divide by the atomic weight to calculate the number of atoms of a substance. You would need at least 79 binary digit integers to do that calculation and you might lose a few decimal points also if the number of grams is a very small number. So we move the point around (floating point see) to suit the required scale of the number without sacrificing our accuracy. Note the difference here between accuracy and precision, they are not the same thing.
I don't really understand. What's with the 79 binary digit integers? can you explain with some schematics or something like that please? Smile

And that's not so bad as 1024 or 16383 exponent. It's a MUCH too big number, without much precision.

But I know there's something for it, and I know you explained it there, sorry 'cause I didn't understand.


Sorry again for such confusement.

EDIT: I got it. Thanks Smile
But that big numbers will get used somewhere, or need more accuracy? In fact, what if I wanted to represent some fractional parts?
Post 01 Jun 2006, 17:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17251
Location: In your JS exploiting you and your system
revolution
The_Grey_Beast wrote:
But that big numbers will get used somewhere, or need more accuracy?
The accuracy never changes no matter what the exponent is. For SP we always get ~7 digits accuracy, for DP ~15 digits and for EDP ~19 digits accuracy. Not many numbers are known to accuracies of 19 digits, certainly no measured value that I am aware of has achieved a provable accuracy of more than 10 digits. So DP (15 digits) should be fine for most real world high accuracy problems, and SP for most real world normal problems.

Things like PI or E, of course, can be calculated to staggering numbers of decimal places but digits beyond about the first 10 digits are mostly meaningless except in very specialised situations.
The_Grey_Beast wrote:
In fact, what if I wanted to represent some fractional parts?
That is precisly the reason FP formats exist, to represent both large and small numbers with a specified accuracy. 1.660540177x10^-24 etc. can also be represented in DP. BTW what do you get when you miltiply 1.660540177x10^-24 by Avogadro's number? Can you find the answer using only an interger format or a fixed point format? The difference in scale of the two numbers is very large so we need enough bits in the exponent field to cover the possible range that could occur. What if you divide instead of multiply? Now the exponent goes even further into either positive or negative. This can happen in an internal computation before the final answer is produced, so overflow/underflow must be considered when choosing a suitable exponent size.
Post 01 Jun 2006, 19:04
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.