ok, here we go:
In computers there are several different types of numbers used. In most home operating systems a floating point 64 bit numerical system is used, manipulated in machine language as binary, convertible with great accuracy to both hex and base 10 with double precision. However problems occur, since there is a limited amount of memory used to store these numbers.
So how is each number represented? There are 64 bits used to formulate numbers (stored as binary); 1 bit is used for the sign (+/-), there are then 11 bits used to store the possible exponent of a number, and the other 52 store the “mantissa” which represents the string of significant bits. There is also an assumed 1 before the decimal of every number in this binary representation. So a given number is of the form; (+?-)1.bbbb x 2^p
the b’s are the mantissa, the p is the exponent.
So why does the computer assume that .99999999=1?
When numbers are computed in a machine the answer is extremely precise, more than 64 bits in many cases. Therefore something must be done with the 65th bit. There are two options here, chop the bit right off like it never existed, or the much more practical rounding.
Rounding in base10 numbers is usually 5 or higher, for a binary system if the 53rd bit of the mantissa is 1 the number gets rounded up. For a number like .99999, this causes a cascading effect in the binary representation of the number.
So then there’s error present here since .99999 != 1, so how big is the error?
Seeing that I’ve already explained this way too in depth, take my word that in fl64 double precision systems the error is <=(2^-52)/2 for any number.
p.s. Thanks for helping my study for my numerical analysis test tomorrow!