flat assembler
Message board for the users of flat assembler.
Index
> Heap > IEEE floating point format is silly 
Author 

revolution
Quote: When will you need 1024 binary digits to the left/right? What size exponent field do you think would be suitable? It is mostly a tradeoff with what programs will most likely need. Some require more precision, others more scale, sometimes more of both. It all depends on your application. 

30 May 2006, 19:31 

Borsuc
Most programs use smallscale, as it has more precision (and even those that appear to use "largescale" may still convert it, I've seen a lot).
And with largescale, for example, 32bit precision can't use the value 16777217... it simply cannot represent it, and it uses the 16777218 value. Less of a chance to use fractional values. I think a suitable exponent for singleprecision (and perhaps doubleprecision too) would be about 32 (or 16, I dunno) to the left/right. 32 seems might be a bit much (considering that even the largest exponent SHOULD still represent fractional parts, no?). But LARGE values in floating arithmethic are very BAD  not only they will be unable to represent fractional values, they will also be unable to represent ALL the integer values (my example with 16777217). With 16bit exponent to the left you will only have "65536" as the max "integer" part, but you will still have "fractional part". It's better if it scales down rather than up in FP aritmethic. Having a sufficiently large exponent means you won't be able to represent fractional values, and even not all integer values, that's why I don't see a point (it's better to use integers there, no?). 128 exponent is nothing compared to 1024 binary digits.. Imagine how much precision and integer values you'll lose! There's no need for such large exponents. Instead, the mantissa should be more precise (more bits). 

30 May 2006, 19:54 

Borsuc
Am I the only crazy one to notice this?


30 May 2006, 21:02 

Tomasz Grysztar
I think you miss the point of floating point formats.


30 May 2006, 22:18 

Borsuc
I know, there's gotta be something here, but I just don't get it.
Can you explain it? thanks 

30 May 2006, 22:22 

donkey7
grey beast:
i think that your algos are silly, not the ieee format itself. exponent sizes are optimal for me. btw: nothing will be suitable for all, i mean: somebody want bigger exponents, and somebody wants bigger mantissa. i think the only disadvantages of ieee format is presence of nans and denormalised numbers. if i design the format it will look something like this: 1000000...0000  minus zero 0000000...0000  plus zero 1111111...1111  minus infinity 0111111...1111  plus infinity and all other combinations will be normal numbers (with one hidden bit). the problem with floats is that mathematical equations are not true in fpu world. for example a/c*c is not equal to a, as we know from mathematic. so we must use special treatment of floats. imagine we have float with 15 bit mantissa and we want to sum up one value of 1.000.000 and million values of 1. we can do this in at least two ways: a) 1.000.000+ 1+ 1+ 1+ 1+ ..... b) 1+ 1+ 1+ ... + 1.000.000 let trace both examples: a) 1.000.000+ 1 = 1.000.000 (because we have only 15.bits precision) 1.000.000 + 1 = 1.000.000 ... 999.997 iterations as above... and finally 1.000.000 + 1= 1.000.000 we see that our result is far from correct b) 1+ 1 = 2 2+ 1 = 3 ...999.997 iter... and finally 1.000.000+ 1.000.000 = 2.000.000 in fact it will differ somewhat, say it will result in 2.000.135 but as we see the result is much more correct! so we see that order of calculations is much more important that the mantissa size!!! after thinking a bit i invented some methods to preserve most accuracy in some cases: a) we have to sum up one million random floats. we can sum numbers from left to right but it's very inaccurate (say we have 123523523 + 0.0000235 the result will be 123523523, so we have a big lose of accuracy). much better strategy is to sum up two numers with smallest exponents and replace them with result and repat those steps until we have final result. we can use ie. heap to do this efficiently. pseudocode: 1. insert all values into heap 2. a = extractmin; 3. b = extractmin; 4. c = a + b; 5. insert c into heap; 6. if more than two alues in heap then goto 2 7. return extractmin and we have the most accurate value we can achieve. ps. extractmin resturns value with smallest exponent (smallest absolute value). b) subtraction is analogic and you can even mix subtractions and addiotions because we don't look at sign when doing extractmin (it relies on exponent, not on sign). c) multiplication. it's a bit more complicated. instead of exponent we must compute rcz (rightmost consecutive zeroes). bsr can do this. algo is the same as in addition but extractmin returns values with biggest rcz. d) for division you must use simple trick. consider equation: a = b/ c/ d/ e/ f it can be rewritten in this way: a = b/ (c* d* e* f) and you do multiplications first and then one divsion those are just my few thoughts but remeber: firstly you must improve your algorithm *then* implementation!!! edit: rcz = rightmost consecutive zeroes in matissa of course 

31 May 2006, 15:27 

revolution
NaN's are useful for tracking errors/overflow/underflow without having to check every intermediate result.


31 May 2006, 16:34 

donkey7
but you have flags. if fpu is done in normal way, i mean using registers and flags, not silly stack, then nans will be useless. and afair nans are handled much slowly.


31 May 2006, 17:34 

f0dder
Quote:
Is that a problem, though? NaN's aren't supposed to show up very often. And doesn't the FPU support throwing exceptions? Dunno if SSE does as well... been a while since I digged into those parts of the books 

31 May 2006, 17:58 

revolution
Quote: but you have flags 

01 Jun 2006, 03:04 

donkey7
Quote:
because of fpu implementation. with cpu you haven't such big penalty on time. if intel design fpu like cpu (registers instead of stack) then it will be as efficient as cpu. 

01 Jun 2006, 07:59 

tom tobias
Quote:
Historically, i.e. 20 years ago, the fpu resided on a separate chip. Designed by a different team, from the architects of the cpu, it was irritating to learn that stack manipulation, rather than register useage was required. Eventually, around 1989, this conceptually archaic fpu was integrated with the cpu on a single chip (486), and it is from this date that the myth of the INCREASED execution speed of the fpu was born. In fact, a given algorithm could have been executed somewhat faster, if a portion of the task could have been executed by the fpu CONCURRENTLY with the portion of the task executed by the cpu, thereby reducing the overall task time. I attempted, UNSUCCESSFULLY, in 1990, to implement such a scenario with the inline computation of a fast fourier transform, executing part of the algorithm with the integer unit, and the other half with the floating point processor. Overhead managing the two parts was significant, so little or no gain was achieved by me. Since 1993, when I abandoned assembly language programming, cpu architecture has made such dramatic improvements, it now seems, in retrospect, that my thinking at the time represented a completely futile exercise. I imagined that it should be possible to harness two oxen to plow the field in half the time....Today we have dual integer execution units on a single chip, together with fpu! Wow. Employing a split radix, the overhead may now be less onerous, assigning, for example, the radix 2 code to fpu, the radix 4 code to the second cpu, and the overhead, assignment, recombination, to the first cpu. Wow. Think of the time one could waste working on such a problem! Desktop cpu's today are so fast, compared with "supercomputers" of twenty years ago, the cpu could compute a DFT, let alone FFT, in milliseconds, obviating need for further speed enhancements. Additional advantage: contemporary cpu's generate so much heat that one can literally use them to heat a room. I look forward to that day when some company comes along with a MUCH SIMPLER, smaller, slower cpu, without cache, without fpu, without branch prediction, and with a much less sophisticated instruction set, so that mundane tasks, like internet browsing can be accomplished with ease, at much lower cost. I wonder what kind of efficiency one could gain from a cluster of four integer units on a single chip, having NO cache, with all data stored in main memory (DDR2 dual channel 667, wow) to reduce overhead needed to determine coherency? 

01 Jun 2006, 10:33 

vid
tom tobias: tried RISCs? i am now exploring ARM processors, and they seem simple but powerful  like fasm


01 Jun 2006, 12:30 

Borsuc
donkey7: I don't know why I just don't get it.. sorry again to bother
I don't see, when you have a (let's say 53 bit mantissa for the double precision) why do you need 1024 exponent? Exponents larger than 53 will result in integers  i.e unable to represent fractions. From this point, I see no advantage for floats over ints. And plus, an even larger exponent will result in "skipping some integer values", so it is even worse than integers! That's why I don't get it. And this is only exponent 56 or 57... imagine exponent 1024.. you will have approximately 1024 binary digits, with approx 971 0s at the end. Imagine how much accuracy you lose, and what a big number you have. There's no need for it. Just think a bit. If the exponent has a value larger than mantissa's bits, then you will render the value unable to represent fractional values. An even larger exponent will skip some integer values (i.e worse than integers). Say you have 64bit mantissa. A 16383 exponent means... With exponent 64 there, you'll have NO fractional bits (all of the mantissa belongs to the integer part). Now, imagine a 16383 exponent, which will add approximately 16319 0s such a large number, but such LOW precision... All those 0s are unable to be 1s, because they belong to the exponent, not mantissa. That's why I don't get it. In my opinion, the exponent should represent at most the mantissa's size. For example, if you have a 53bit mantissa, a suitable exponent (in my opinion) would be 64exponent (6 bits) or 32exponent (5 bits). Why 32? Because I think that you should still be able to use fractional values when you have the maximum value in the exponent, no? If you have a 53 value in the exponent, you will have NO fractional part. And a 63 exponent (maximum) Thus, that's why a 64exponent might be a bit much. But I think I don't get it... I don't see a point in such large exponents (16383 ). When are they used? Or is there another trick in the middle? I know something about dynamicrange but I don't get it. And why does it need to be such large? After all, you lose precision exponentially and proportionally to the number's size. I.e the larger the number, the much more precision you lose. I find it silly because I don't see the point for such large numbers, it's way off in my opinion. Anyone can explain this? I just don't get it 

01 Jun 2006, 16:23 

revolution
Avogadro's number (6.022136736*10^23) is known to about 10 digits accuracy, there is no need to list/store extra zero digits at the end because they are unknown. Should I write this?
Code: 602213673600000000000000 We can still do 7 digit accuracy calculations with large numbers like that using only single precision. eg. multiply Avogadro's number by weight in grams and divide by the atomic weight to calculate the number of atoms of a substance. You would need at least 79 binary digit integers to do that calculation and you might lose a few decimal points also if the number of grams is a very small number. So we move the point around (floating point see) to suit the required scale of the number without sacrificing our accuracy. Note the difference here between accuracy and precision, they are not the same thing. Doing counting by repeatedly adding 1 to a number is best suited to using integers, try getting your computer to count a 64 bit number one number at a time, you will be waiting a very long time for the result! Usually FP is not used for such a situation. Like I said above, it all depends on your application. 

01 Jun 2006, 16:57 

Borsuc
revolution wrote: We can still do 7 digit accuracy calculations with large numbers like that using only single precision. eg. multiply Avogadro's number by weight in grams and divide by the atomic weight to calculate the number of atoms of a substance. You would need at least 79 binary digit integers to do that calculation and you might lose a few decimal points also if the number of grams is a very small number. So we move the point around (floating point see) to suit the required scale of the number without sacrificing our accuracy. Note the difference here between accuracy and precision, they are not the same thing. And that's not so bad as 1024 or 16383 exponent. It's a MUCH too big number, without much precision. But I know there's something for it, and I know you explained it there, sorry 'cause I didn't understand. Sorry again for such confusement. EDIT: I got it. Thanks But that big numbers will get used somewhere, or need more accuracy? In fact, what if I wanted to represent some fractional parts? 

01 Jun 2006, 17:10 

revolution
The_Grey_Beast wrote: But that big numbers will get used somewhere, or need more accuracy? Things like PI or E, of course, can be calculated to staggering numbers of decimal places but digits beyond about the first 10 digits are mostly meaningless except in very specialised situations. The_Grey_Beast wrote: In fact, what if I wanted to represent some fractional parts? 

01 Jun 2006, 19:04 

< Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992020, Tomasz Grysztar.
Powered by rwasa.