help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interpretation of result


From: Przemek Klosowski
Subject: Re: Interpretation of result
Date: Tue, 18 Sep 2012 17:32:46 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0

On 09/17/2012 02:59 PM, JuanPi wrote:
Can anybody explain this to a simple mortal who did not take a
rigorous numerical calculus class?

octave:89> 99999999999999999 - 99999999999999998
ans = 0

(that is 17 digits) ...

The common idiom endows Octave's double precision numbers with 16+ digits of precision. This is not quite right---like with many 'advertised features', you should read it as 'is guaranteed not to exceed' :) The IEEE 754 double precision floating point numbers have 52 bit mantissas, and the precision is given by 2^(-52), also known as 'eps'---it doesn't translate into a nice round decimal number like 1/9..99. The double precision FP representation is explained nicely in
http://en.wikipedia.org/wiki/Double-precision_floating-point_format

The numbers you are interested in would require more than 56 bits of precision: log(99999999999999999)/log(2) is 56.473. Since the bits required for such precision aren't there, many consecutive numbers are encoded by the same IEEE binary representation as you can see by asking Octave to print all the bits:

format bit; a=(99999999999999991:99999999999999999)'
a =
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001001111111111111
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000
  0100001101110110001101000101011110000101110110001010000000000000

As you can see, the integers in that range aren't exactly representable in IEEE 754 form. What's worse, note that Octave thinks the interval contains 17 numbers---it can't even keep track of the size of that interval, because it can't perform reliable calculations on its limits. This isn't a bug in Octave, just a feature of floating point number representation.

As the Wikipedia article indicates, the maximum integer value that's exactly representable as a double is 2^53 or 9,007,199,254,740,992, which is over ten times smaller than your numbers; anything larger loses precision and will not result in reliable integer-like results.

format bit; a=2^53+(-5:5)'
a =
  0100001100111111111111111111111111111111111111111111111111111011
  0100001100111111111111111111111111111111111111111111111111111100
  0100001100111111111111111111111111111111111111111111111111111101
  0100001100111111111111111111111111111111111111111111111111111110
  0100001100111111111111111111111111111111111111111111111111111111
  0100001101000000000000000000000000000000000000000000000000000000
  0100001101000000000000000000000000000000000000000000000000000000
  0100001101000000000000000000000000000000000000000000000000000001
  0100001101000000000000000000000000000000000000000000000000000010
  0100001101000000000000000000000000000000000000000000000000000010
  0100001101000000000000000000000000000000000000000000000000000010

Now, Octave has an uint64 data type which of course does not use first 12 bits for the exponent, so it has 12 more bits of precision. It allows arithmetic on numbers up to 2^64 or 18,446,744,073,709,551,616. See how the uint64 numbers increment by one and don't show any gaps or repetitions:

format bit; a=uint64(2^53)+(-5:5)'
a =
  0000000000011111111111111111111111111111111111111111111111111011
  0000000000011111111111111111111111111111111111111111111111111100
  0000000000011111111111111111111111111111111111111111111111111101
  0000000000011111111111111111111111111111111111111111111111111110
  0000000000011111111111111111111111111111111111111111111111111111
  0000000000100000000000000000000000000000000000000000000000000000
  0000000000100000000000000000000000000000000000000000000000000001
  0000000000100000000000000000000000000000000000000000000000000010
  0000000000100000000000000000000000000000000000000000000000000011
  0000000000100000000000000000000000000000000000000000000000000100
  0000000000100000000000000000000000000000000000000000000000000101



reply via email to

[Prev in Thread] Current Thread [Next in Thread]