[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Axiom-developer] [#167 Infinite Floats Domain] Float and DFloat

From: wyscc
Subject: [Axiom-developer] [#167 Infinite Floats Domain] Float and DFloat
Date: Mon, 13 Jun 2005 07:49:16 -0500


-What is the domain FLOAT if not already "infinite but bounded
What is the domain 'FLOAT' if not already "infinite but bounded


>From Tim Daly Sun, 12 Jun 2005 17:17:54 -0500<br>
From: Tim Daly<br>
Date: Sun, 12 Jun 2005 17:17:54 -0500<br>
Subject: infinite floats domain<br>
Message-ID: <address@hidden><br>

you're right, of course.<br>
my mind is mush at the moment.<br>
i've been driving all week.<br>


>From Bill Page, Sun, 12 Jun 2005 21:24:48 -0400<br>
From: "Bill Page" <address@hidden><br>
Subject: infinite floats domain<br>
Message-ID: <address@hidden>,br>

On June 12, 2005 6:18 PM Tim Daly wrote:

> you're right, of course.
> my mind is mush at the moment.
> i've been driving all week.

Yah, that'll do it to ya ... :) How's CMU?

Seriously, I think that there *are* some issues that
should be dealt with here. There may very well be some
situations that could make more efficient and/or effective
use of Axiom's "infinite precision" floats. I expect that
just as in the case of "infinite" integers, there may be
some better ways and some worse ways of doing calculations
with this objects.

For that matter, mathematically speaking just what is Axiom's
'Float' type? Is it formally equivalent (though obviously not
computationally equivalent) to 'FRAC INT'? How does it relate
to standard (IEEE ?) definitions of floating point? How does
it differ mathematically from the reals, c.f. 'RealClosure',

Bill Page.

>From wyscc Mon, 13 Jun 2005 07:00:00 -4:00

Axiom has 'DoubleFloat' (used to be called 'SmallFloat', hence 'sf.spad', I 
think), and 'Float' ('float.spad'). 'DoubleFloat' is IEEE or "native precision" 
(according to 'sf.spad'). Any floating point system is only, mathematically 
speaking, a small subset, and not evenly distributed one for that, of the 
reals, and for that matter, of the rationals. It is definitely not 'FRAC INT', 
which <B>is</B> mathematically equivalent to the field of rational numbers.

In a fixed precision floating point system, there are only finitely many 
numbers, so it cannot be 'FRAC INT'. In such a system, a floating point number 
has the form $m \times b^e$, where $m$ is the mantissa (assume finite 
precision), $b$ is the base (fixed), and $e$ is the exponent (assume also 
finite precision). In practice, there is an offset when $m$ and $e$ are stored 
so that $m$ is usually interpreted with a "base-point"  and the most 
significant digit is normalized to be non-zero, and $e$ has an offset depending 
on the machine word length and how a word is divided between $m, e$ (and the 
signs, which for simplicity, we will assume are included in $m$ and $e$). 

Axiom implementation uses the simplified representation of $m \times b^e$ as it 
mathematically stands ($m$ is still normalized to give a unique representation, 
but the base point is to the right of the least significant digit). The 
intrinsic properties do not depend on the base or the word length, so let's 
assume $b = 10$ for discussion purposes. 

Let's say the exponent has precision one (that is, only one digit plus one sign 
bit). So there are 19 exponents (from $-9$ to 9). Say the mantissa m has 
precision 2 (that is, only two digits plus one sign bit). So there are 181 
possible mantissas 
(from $-99$ to $-10$, 0, 10 to 99). For simplicity, we will only discuss 
positive numbers. So one sees that there can only be $19 \times 90$ positive 
numbers. But these are not evenly distributed. There are 90 positive numbers 
for ANY two successive exponents: so 90 positive numbers between 10 and 99, 90 
between 100 and 990, and 90 between 1000 and 9900. It is clear that the 
accuracy of a real number by a floating point number in this system becomes 
less and less as the exponent goes up. 
This is the source of round-off errors.

This is basically the situation in DFLOAT (on Intel processors), where $b = 2$, 
$m$ has 53 bits stored in a 52 bit field (not including sign, note that in base 
2, the most significant digit normalized must be 1, so no need to store it!) 
and $e$ has 11 bits (including sign, ranging from $-1075$ to $971$ which 
accounts for an offset of 53 because the base point location). This is exactly 
equivalent to the IEEE-754 standard for 64 bit floating point. The actual 
arithmetics is done via Lisp, which I assume calls the hardware.

In 'FLOAT', conceptually the infinite precision floating point system, is 
basically also finite precision floating point system, with the ability to 
increase precision as requested. However, this promotion is not automatic. In 
other words, every 'FLOAT' number has its own precision, implicit in the length 
of its mantissa. In effect, 'FLOAT' is the union of floating point systems with 
various precisions, but without any coercion between them. So Tim is not 
entirely wrong is saying we don't have infinite precision floating point in the 
spirit of INT. The real question for the developers is then: 

<B>Shall we improve 'FLOAT' or shall we implement a new domain of infinite 
precision float (and if so, how, or rather, what are the specifications of such 
a domain)?</B>

Example. Addition of two 'FLOAT' with different orders of magnitude will not 
lead to a higher precision answer. What should be the right answer depends on 
your point of view: if the precision of the number is important, then this is 
correct. If exact arithmetic with floating point is what you want, this is not 

If 'x' and 'y' are 'FLOAT', but 'y' is much larger than 'x', say the exponent 
of 'y' exceeds that of 'x' (when both 'x' and 'y' are normalized) by the 
current precision of 'FLOAT', then 'x+y' would be 'y', not 'x+y' with precision 
extended (see routine 'plus' in 'float.spad'). Note that 68 bits is 
approximately 20 decimal digits. The display for 'y' should have 9 trailing 
zeros after the decimal.


By the way, in interpreting correctness of displayed answers for floating point 
computation, there is another source of error besides round off. For example,

z - x

Most people would expect the answer of 'z-x' to be '0.16887242 E-20' but this 
ignores the fact that the display is converted from an internal binary 
representation to a decimal one. During the conversion there is truncation 
error (think of 0.1 in base 3 converted to decimal 0.3333... with output at any 
finite precision). So 'x' is not what it seems, and neither is 'z'. Below, the 
constants are converted to binary internally before computation, at a higher 
precision than x (resp. z) to bring out the differences. We can now see that 
'x' is larger than 'z'. So Axiom is correct and the expectated answer is wrong.

x - 1.4142135623730950488
z - 1.4142135623730950488016887242

forwarded from

reply via email to

[Prev in Thread] Current Thread [Next in Thread]