bug#8525: Lisp reader and string-to-number bugs and inconsistencies

bug-gnu-emacs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#8525: Lisp reader and string-to-number bugs and inconsistencies

From:	Paul Eggert
Subject:	bug#8525: Lisp reader and string-to-number bugs and inconsistencies
Date:	Thu, 21 Apr 2011 13:05:51 -0700
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110307 Fedora/3.1.9-0.39.b3pre.fc14 Thunderbird/3.1.9
On 04/20/11 06:08, Stefan Monnier wrote:
> We want to use floats rather than signal an overflow (this is evident
> from the history of the code since the conversion to floats was added
> somewhat recently).

OK I came up with a revised patch (below) which does that.  I want to
test it a bit more, but thought I'd publish it now.  A couple of things:

* It's easy to use floats for huge base-10 numbers, but there's no
  portable and accurate way to convert huge non-base-10 values to
  floats.  This patch still signals overflow for cases like
  (string-to-number "ffffffffffffffffffffffffff" 16) where the integer
  cannot be represented as a 64-bit number.  This is the best we can
  do easily, without going to arbitrary-precision arithmetic, or
  without having some rounding errors that I'd rather avoid.

* This patch uses strtoumax to get the widest available conversions
  for non-base-10 integers, as the result can be converted to float easily
  and accurately.  strtoumax is a C99 function and works on standard
  modern platforms; for the oddball exception we can use the gnulib
  substitute.  This part is a one-line change to Makefile.in that
  drags in a bunch of gnulib code, but this code is well-tested on old
  platforms and isn't needed on new ones.  I've attached a gzipped
  patch for this porting change.

For Windows, the implications for strtoumax are that <inttypes.h>
should exist, and should define uintmax_t and strtoumax.  If Windows
doesn't already do that, a simple substitute inttypes.h like the
following should do the trick:

   #define uintmax_t unsigned long long
   #define strtoumax strtoull

or, if Windows doesn't support long long, then use "unsigned long"
and "strtoul".

Here's the patch.  I'm afraid that it has to be applied by hand,
as I hand-mangled it out of a bunch of other patches.  If you
want something that applies automatically please let me know and
I'll generate one.

2011-04-21  Paul Eggert  <eggert@cs.ucla.edu>

        Make the Lisp reader and string-to-float more consistent.
        * data.c (atof): Remove decl; no longer used or needed.
        (digit_to_number): Move to lread.c.
        (Fstring_to_number): Use new string_to_number function, to be
        consistent with how the Lisp reader treats infinities and NaNs.
        Do not assume that floating-point numbers represent EMACS_INT
        without losing information; this is not true on most 64-bit hosts.
        Avoid double-rounding errors, by insisting on integers when
        parsing non-base-10 numbers, as the documentation specifies.
        * lisp.h (string_to_number): New decl, replacing ...
        (isfloat_string): Remove.
        * lread.c: Include <inttypes.h>, for uintmax_t and strtoumax.
        (read1): Do not accept +. and -. as integers; this
        appears to have been a coding error.  Similarly, do not accept
        strings like +-1e0 as floating point numbers.  Do not report
        overflow for integer overflows unless the base is not 10 which
        means we have no simple and reliable way to continue.
        Break out the floating-point parsing into a new
        function string_to_number, so that Fstring_to_number parses
        floating point numbers consistently with the Lisp reader.
        (digit_to_number): Moved here from data.c.  Make it static inline.
        (E_CHAR, EXP_INT): Remove, replacing with ...
        (E_EXP): New macro, to solve the "1.0e+" problem mentioned below.
        (string_to_number): New function, replacing isfloat_string.
        This function checks for valid syntax and produces the resulting
        Lisp float number too.  Rework it so that string-to-number
        no longer mishandles examples like "1.0e+".  Use strtoumax,
        so that overflow for non-base-10 numbers is reported only when
        there's no portable and simple way to convert to floating point.

diff -pu trunk/src/data.c atest/src/data.c
--- trunk/src/data.c    2011-04-16 16:07:57.605482000 -0700
+++ atest/src/data.c    2011-04-20 14:48:29.597646000 -0700
@@ -48,10 +48,6 @@ along with GNU Emacs.  If not, see <http
 
 #include <math.h>
 
-#if !defined (atof)
-extern double atof (const char *);
-#endif /* !atof */
-
 Lisp_Object Qnil, Qt, Qquote, Qlambda, Qunbound;
 static Lisp_Object Qsubr;
 Lisp_Object Qerror_conditions, Qerror_message, Qtop_level;
@@ -2374,35 +2370,10 @@ NUMBER may be an integer or a floating p
   return build_string (buffer);
 }
 
-INLINE static int
-digit_to_number (int character, int base)
-{
-  int digit;
-
-  if (character >= '0' && character <= '9')
-    digit = character - '0';
-  else if (character >= 'a' && character <= 'z')
-    digit = character - 'a' + 10;
-  else if (character >= 'A' && character <= 'Z')
-    digit = character - 'A' + 10;
-  else
-    return -1;
-
-  if (digit >= base)
-    return -1;
-  else
-    return digit;
-}
-
 DEFUN ("string-to-number", Fstring_to_number, Sstring_to_number, 1, 2, 0,
        doc: /* Parse STRING as a decimal number and return the number.
 This parses both integers and floating point numbers.
@@ -2415,7 +2386,6 @@ If the base used is not 10, STRING is al
 {
   register char *p;
   register int b;
-  int sign = 1;
   Lisp_Object val;
 
   CHECK_STRING (string);
@@ -2430,40 +2400,13 @@ If the base used is not 10, STRING is al
        xsignal1 (Qargs_out_of_range, base);
     }
 
-  /* Skip any whitespace at the front of the number.  Some versions of
-     atoi do this anyway, so we might as well make Emacs lisp consistent.  */
   p = SSDATA (string);
   while (*p == ' ' || *p == '\t')
     p++;
 
-  if (*p == '-')
-    {
-      sign = -1;
-      p++;
-    }
-  else if (*p == '+')
-    p++;
-
-  if (isfloat_string (p, 1) && b == 10)
-    val = make_float (sign * atof (p));
-  else
-    {
-      double v = 0;
-
-      while (1)
-       {
-         int digit = digit_to_number (*p++, b);
-         if (digit < 0)
-           break;
-         v = v * b + digit;
-       }
-
-      val = make_fixnum_or_float (sign * v);
-    }
-
-  return val;
+  val = string_to_number (p, b, 1);
+  return NILP (val) ? make_number (0) : val;
 }
-
 
 enum arithop
   {
diff -pu trunk/src/lisp.h atest/src/lisp.h
--- trunk/src/lisp.h    2011-04-15 01:47:55.574976000 -0700
+++ atest/src/lisp.h    2011-04-21 10:25:15.716995000 -0700
@@ -2782,7 +2782,7 @@ extern Lisp_Object oblookup (Lisp_Object
   } while (0)
 extern int openp (Lisp_Object, Lisp_Object, Lisp_Object,
                   Lisp_Object *, Lisp_Object);
-extern int isfloat_string (const char *, int);
+Lisp_Object string_to_number (char const *, int, int);
 extern void map_obarray (Lisp_Object, void (*) (Lisp_Object, Lisp_Object),
                          Lisp_Object);
 extern void dir_warning (const char *, Lisp_Object);
diff -pu trunk/src/lread.c atest/src/lread.c
--- trunk/src/lread.c   2011-04-14 16:10:48.006381000 -0700
+++ atest/src/lread.c   2011-04-21 12:15:13.973449000 -0700
@@ -19,6 +19,7 @@ along with GNU Emacs.  If not, see <http
 
 
 #include <config.h>
+#include <inttypes.h>
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -3005,86 +3006,9 @@ read1 (register Lisp_Object readcharfun,
 
        if (!quoted && !uninterned_symbol)
          {
-           register char *p1;
-           p1 = read_buffer;
-           if (*p1 == '+' || *p1 == '-') p1++;
-           /* Is it an integer? */
-           if (p1 != p)
-             {
-               while (p1 != p && (c = *p1) >= '0' && c <= '9') p1++;
-               /* Integers can have trailing decimal points.  */
-               if (p1 > read_buffer && p1 < p && *p1 == '.') p1++;
-               if (p1 == p)
-                 /* It is an integer. */
-                 {
-                   if (p1[-1] == '.')
-                     p1[-1] = '\0';
-                   {
-                     /* EMACS_INT n = atol (read_buffer); */
-                     char *endptr = NULL;
-                     EMACS_INT n = (errno = 0,
-                                    strtol (read_buffer, &endptr, 10));
-                     if (errno == ERANGE && endptr)
-                       {
-                         Lisp_Object args
-                           = Fcons (make_string (read_buffer,
-                                                 endptr - read_buffer),
-                                    Qnil);
-                         xsignal (Qoverflow_error, args);
-                       }
-                     return make_fixnum_or_float (n);
-                   }
-                 }
-             }
-           if (isfloat_string (read_buffer, 0))
-             {
-               /* Compute NaN and infinities using 0.0 in a variable,
-                  to cope with compilers that think they are smarter
-                  than we are.  */
-               double zero = 0.0;
-
-               double value;
-
-               /* Negate the value ourselves.  This treats 0, NaNs,
-                  and infinity properly on IEEE floating point hosts,
-                  and works around a common bug where atof ("-0.0")
-                  drops the sign.  */
-               int negative = read_buffer[0] == '-';
-
-               /* The only way p[-1] can be 'F' or 'N', after isfloat_string
-                  returns 1, is if the input ends in e+INF or e+NaN.  */
-               switch (p[-1])
-                 {
-                 case 'F':
-                   value = 1.0 / zero;
-                   break;
-                 case 'N':
-                   value = zero / zero;
-
-                   /* If that made a "negative" NaN, negate it.  */
-
-                   {
-                     int i;
-                     union { double d; char c[sizeof (double)]; } u_data, 
u_minus_zero;
-
-                     u_data.d = value;
-                     u_minus_zero.d = - 0.0;
-                     for (i = 0; i < sizeof (double); i++)
-                       if (u_data.c[i] & u_minus_zero.c[i])
-                         {
-                           value = - value;
-                           break;
-                         }
-                   }
-                   /* Now VALUE is a positive NaN.  */
-                   break;
-                 default:
-                   value = atof (read_buffer + negative);
-                   break;
-                 }
-
-               return make_float (negative ? - value : value);
-             }
+           Lisp_Object result = string_to_number (read_buffer, 10, 0);
+           if (! NILP (result))
+             return result;
          }
        {
          Lisp_Object name, result;
@@ -3242,74 +3166,179 @@ substitute_in_interval (INTERVAL interva
 }
 
 
+static inline int
+digit_to_number (int character, int base)
+{
+  int digit;
+
+  if ('0' <= character && character <= '9')
+    digit = character - '0';
+  else if ('a' <= character && character <= 'z')
+    digit = character - 'a' + 10;
+  else if ('A' <= character && character <= 'Z')
+    digit = character - 'A' + 10;
+  else
+    return -1;
+
+  return digit < base ? digit : -1;
+}
+
 #define LEAD_INT 1
 #define DOT_CHAR 2
 #define TRAIL_INT 4
-#define E_CHAR 8
-#define EXP_INT 16
+#define E_EXP 16
 
-int
-isfloat_string (const char *cp, int ignore_trailing)
+
+/* Convert STRING to a number, assuming base BASE.  Return a fixnum if CP has
+   integer syntax and fits in a fixnum, else return the nearest float if CP has
+   either floating point or integer syntax and BASE is 10, else return nil.  If
+   IGNORE_TRAILING is nonzero, consider just the longest prefix of CP that has
+   valid floating point syntax.  Signal an overflow if BASE is not 10 and the
+   number has integer syntax but does not fit.  */
+
+Lisp_Object
+string_to_number (char const *string, int base, int ignore_trailing)
 {
   int state;
-  const char *start = cp;
+  char const *cp = string;
+  int leading_digit;
+  int float_syntax = 0;
+  double value = 0;
+
+  /* Compute NaN and infinities using a variable, to cope with compilers that
+     think they are smarter than we are.  */
+  double zero = 0;
+
+  /* Negate the value ourselves.  This treats 0, NaNs, and infinity properly on
+     IEEE floating point hosts, and works around a formerly-common bug where
+     atof ("-0.0") drops the sign.  */
+  int negative = *cp == '-';
+
+  int signedp = negative || *cp == '+';
+  cp += signedp;
 
   state = 0;
-  if (*cp == '+' || *cp == '-')
-    cp++;
 
-  if (*cp >= '0' && *cp <= '9')
+  leading_digit = digit_to_number (*cp, base);
+  if (0 <= leading_digit)
     {
       state |= LEAD_INT;
-      while (*cp >= '0' && *cp <= '9')
-       cp++;
+      do
+       ++cp;
+      while (0 <= digit_to_number (*cp, base));
     }
   if (*cp == '.')
     {
       state |= DOT_CHAR;
       cp++;
     }
-  if (*cp >= '0' && *cp <= '9')
-    {
-      state |= TRAIL_INT;
-      while (*cp >= '0' && *cp <= '9')
-       cp++;
-    }
-  if (*cp == 'e' || *cp == 'E')
-    {
-      state |= E_CHAR;
-      cp++;
-      if (*cp == '+' || *cp == '-')
-       cp++;
-    }
 
-  if (*cp >= '0' && *cp <= '9')
-    {
-      state |= EXP_INT;
-      while (*cp >= '0' && *cp <= '9')
-       cp++;
-    }
-  else if (cp == start)
-    ;
-  else if (cp[-1] == '+' && cp[0] == 'I' && cp[1] == 'N' && cp[2] == 'F')
+  if (base == 10)
     {
-      state |= EXP_INT;
-      cp += 3;
+      if ('0' <= *cp && *cp <= '9')
+       {
+         state |= TRAIL_INT;
+         do
+           cp++;
+         while ('0' <= *cp && *cp <= '9');
+       }
+      if (*cp == 'e' || *cp == 'E')
+       {
+         char const *ecp = cp;
+         cp++;
+         if (*cp == '+' || *cp == '-')
+           cp++;
+         if ('0' <= *cp && *cp <= '9')
+           {
+             state |= E_EXP;
+             do
+               cp++;
+             while ('0' <= *cp && *cp <= '9');
+           }
+         else if (cp[-1] == '+'
+                  && cp[0] == 'I' && cp[1] == 'N' && cp[2] == 'F')
+           {
+             state |= E_EXP;
+             cp += 3;
+             value = 1.0 / zero;
+           }
+         else if (cp[-1] == '+'
+                  && cp[0] == 'N' && cp[1] == 'a' && cp[2] == 'N')
+           {
+             state |= E_EXP;
+             cp += 3;
+             value = zero / zero;
+
+             /* If that made a "negative" NaN, negate it.  */
+             {
+               int i;
+               union { double d; char c[sizeof (double)]; }
+                 u_data, u_minus_zero;
+               u_data.d = value;
+               u_minus_zero.d = -0.0;
+               for (i = 0; i < sizeof (double); i++)
+                 if (u_data.c[i] & u_minus_zero.c[i])
+                   {
+                     value = -value;
+                     break;
+                   }
+             }
+             /* Now VALUE is a positive NaN.  */
+           }
+         else
+           cp = ecp;
+       }
+
+      float_syntax = ((state & (DOT_CHAR|TRAIL_INT)) == (DOT_CHAR|TRAIL_INT)
+                     || state == (LEAD_INT|E_EXP));
     }
-  else if (cp[-1] == '+' && cp[0] == 'N' && cp[1] == 'a' && cp[2] == 'N')
+
+  /* Return nil if the number uses invalid syntax.  If IGNORE_TRAILING, accept
+     any prefix that matches.  Otherwise, the entire string must match.  */
+  if (! (ignore_trailing
+        ? ((state & LEAD_INT) != 0 || float_syntax)
+        : (!*cp && ((state & ~DOT_CHAR) == LEAD_INT || float_syntax))))
+    return Qnil;
+
+  /* If the number uses integer and not float syntax, and is in C-language
+     range, use its value, preferably as a fixnum.  */
+  if (0 <= leading_digit && ! float_syntax)
     {
-      state |= EXP_INT;
-      cp += 3;
+      uintmax_t n;
+
+      /* Fast special case for single-digit integers.  This also avoids a
+        glitch when BASE is 16 and IGNORE_TRAILING is nonzero, because in that
+        case some versions of strtoumax accept numbers like "0x1" that Emacs
+        does not allow.  */
+      if (digit_to_number (string[signedp + 1], base) < 0)
+       return make_number (negative ? -leading_digit : leading_digit);
+
+      errno = 0;
+      n = strtoumax (string + signedp, NULL, base);
+      if (errno == ERANGE)
+       {
+         /* Unfortunately there's no simple and accurate way to convert
+            non-base-10 numbers that are out of C-language range.  */
+         if (base != 10)
+           xsignal (Qoverflow_error, list1 (build_string (string)));
+       }
+      else if (n <= (negative ? -MOST_NEGATIVE_FIXNUM : MOST_POSITIVE_FIXNUM))
+       {
+         EMACS_INT signed_n = n;
+         return make_number (negative ? -signed_n : signed_n);
+       }
+      else
+       value = n;
     }
 
-  return ((ignore_trailing
-          || *cp == 0 || *cp == ' ' || *cp == '\t' || *cp == '\n'
-          || *cp == '\r' || *cp == '\f')
-         && (state == (LEAD_INT|DOT_CHAR|TRAIL_INT)
-             || state == (DOT_CHAR|TRAIL_INT)
-             || state == (LEAD_INT|E_CHAR|EXP_INT)
-             || state == (LEAD_INT|DOT_CHAR|TRAIL_INT|E_CHAR|EXP_INT)
-             || state == (DOT_CHAR|TRAIL_INT|E_CHAR|EXP_INT)));
+  /* Either the number uses float syntax, or it does not fit into a fixnum.
+     Convert it from string to floating point, unless the value is already
+     known because it is an infinity, a NAN, or its absolute value fits in
+     uintmax_t.  */
+  if (! value)
+    value = atof (string + signedp);
+
+  return make_float (negative ? -value : value);
 }
strtoumax-patch.txt.gz
Description: GNU Zip compressed data
[Prev in Thread]
Current Thread
[Next in Thread]
bug#8525: Lisp reader and string-to-number bugs and inconsistencies, Paul Eggert, 2011/04/20
- bug#8525: Lisp reader and string-to-number bugs and inconsistencies, Eli Zaretskii, 2011/04/20
- bug#8525: Lisp reader and string-to-number bugs and inconsistencies, Stefan Monnier, 2011/04/20
  - bug#8525: Lisp reader and string-to-number bugs and inconsistencies, Paul Eggert <=
    - bug#8525: Lisp reader and string-to-number bugs and inconsistencies, Stefan Monnier, 2011/04/24
Prev by Date: bug#8492: 23.3; Time to use a different binding for completion?
Next by Date: bug#8530: 23.2; Button 'mouse-face property loses Bold and Italic properties
Previous by thread: bug#8525: Lisp reader and string-to-number bugs and inconsistencies
Next by thread: bug#8525: Lisp reader and string-to-number bugs and inconsistencies
Index(es):
- Date
- Thread