bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9612: sort: avoid a NaN-induced infloop


From: Jim Meyering
Subject: bug#9612: sort: avoid a NaN-induced infloop
Date: Tue, 27 Sep 2011 16:48:45 +0200

This was reported by Aaron Denney in http://bugs.debian.org/642557.

Who would have thought that including a few NaNs in the input to sort
would make it infloop.

The original failure arose only when sort was reading from a pipe:

    yes -- -nan | head -156903 | sort -g > /dev/null

But it's not always easy to reproduce.
For example, on a uni-core Debian unstable amd64 system, the bug strikes
every time with the distro-supplied binary, but not at all on a fedora
rawhide-supplied binary in a multi-core VM.  However, when I compile
coreutils-8.13 myself on that same rawhide system, *that* sort fails
every time.

Setting up to debug was a little tricky.
In order to invoke sort-reading-from-pipe from gdb, I had to use a fifo:

    mkfifo sort-test-pipe
    yes -- -nan | head -156903 | sort-test-pipe &

Then, with gdb, you can do this:

    gdb sort
    run -g < sort-test-pipe


Debugging it, I was surprised to see the infinite loop iterating through
these "*"-marked lines, with nfiles == 2 and always resetting i = 0:

  (gdb) l
  2893         Since this only reorders two items if one is strictly greater 
than
  2894         the other, it is stable. */
  2895      for (i = 0; i < nfiles; ++i)
  2896        ord[i] = i;
* 2897      for (i = 1; i < nfiles; ++i)
* 2898        if (0 < compare (cur[ord[i - 1]], cur[ord[i]]))
* 2899          t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0;
  2900
  2901      /* Repeatedly output the smallest line until no input remains. */
  2902      while (nfiles)
  (gdb) p ord[i-1]
  $6 = 0
  (gdb) p ord[i]
  $7 = 1
  (gdb) p compare (cur[0], cur[1])
  $8 = 64
  (gdb) p compare (cur[1], cur[0])
  $9 = 64

That shows       cur[0] > cur[1], and then, on the very next line
a contradiction, cur[1] > cur[0]

But it's not the classic NaN problem at all.  That would be too easy ;-)
That code already handles NaNs (via the general_numcompare function),
detecting the unusual case and comparing NaNs with memcmp.
Here's most of the general_numcompare function:

    static int
    general_numcompare (char const *sa, char const *sb)
    {
      char *ea;
      char *eb;
      long_double a = strtold (sa, &ea);
      long_double b = strtold (sb, &eb);
      ...
      /* Sort numbers in the usual way, where -0 == +0.  Put NaNs after
         conversion errors but before numbers; sort them by internal
         bit-pattern, for lack of a more portable alternative.  */
      return (a < b ? -1
              : a > b ? 1
              : a == b ? 0
              : b == b ? -1
              : a == a ? 1
              : memcmp (&a, &b, sizeof a));
    }

What could cause such bogosity?
Here's the quick answer:

The statement "long double x = NAN;" (inside glibc's strtold) leaves many
bits of "x" uninitialized.  For most uses of a NaN, that doesn't matter,
but here, since we're actually trying to order the different NaNs, it makes
all of the difference.  [ I'm about to file the glibc bug -- or maybe it's
a bug in the definition of gcc's __builtin_nan.  Haven't looked yet.  ]

What happens when we infloop is that the surrounding code always happens
to ensure that differing bits are 0x40 in the left operand and 0x00 in the
right operand to memcmp.  Then you will see precisely the above behavior.

One "fix" is to return 0 rather than bothering with the memcmp call above,
thus treating all NaNs as equal, rather than attempting to order them
based on their internal bit patterns.  But that is undesirable since
it would render sort's output indeterminate whenever it contains different
types of NaN values.   E.g., NaN, NaN(1), NaN(2), etc. all result in
different bit patterns.

For example, currently, this works as you'd expect on a system with a
NaN(...)-aware strtold, like anything glibc-based:

    $ for i in $(seq 9); do echo "NaN($i)";done|sort -gr
    NaN(9)
    NaN(8)
    NaN(7)
    NaN(6)
    NaN(5)
    NaN(4)
    NaN(3)
    NaN(2)
    NaN(1)

If we were to remove the memcmp, those would not be sorted.

----------------------------------------
A kludgey fix that serves to demonstrate the problem would be to
replace the two strtold invocations with these:

      long_double a = 0; a = strtold (sa, &ea);
      long_double b = 0; b = strtold (sb, &eb);

That does the job at least when compiling with gcc -ggdb3,
but minimal optimization could well eliminate the dead stores.

----------------------------------------
The real solution is to change glibc's strtold so that its "long double"
result contains no uninitialized bit.  In the shorter term, I'd like
a gnulib strtold module that works around the problem.
However, gnulib's strtod.c simply ignores the digits in
NaN(DDDDD), while glibc's implementation actually uses them.
That should be easy to fix, right?  Just merge in the glibc changes?
But no, they rely on ieee754.h's "union ieee854_long_double",
and that is not portable.  So I have resorted to an even shorter-term
work-around in the patch below.

=-=================================

Consequences: whenever qsort's comparison function "lies" (i.e., returns
1 for a < b, -1 for a > b, or 0 for a != b, qsort's behavior is undefined
(i.e., it may well segfault).  GNU sort(1) uses qsort only incidentally,
to sort month names, and nowhere else, but now, you've seen that sort(1)
had a similar problem when its comparison function is inconsistent.

--------------------------------------------------
BTW, once I found the problem code, I realized I could provoke
the infloop with two one-line inputs using sort's --merge option:

    echo nan > 1; sort -m -g 1 1

--------------------------------
Here's a little program to demonstrate the problem.
Note how the results differ with compilation options:

  #include <stdlib.h>
  #include <stdio.h>

  static char *
  fmt_nan (long double x)
  {
    unsigned int i;
    static char buf[33];
    unsigned char const *p = (unsigned char const *) &x;
    for (i = 0; i < sizeof x; i++)
      sprintf (buf + 2*i, "%02x", *p++);
    return buf;
  }

  int
  main ()
  {
    const char *q = "nan";
    long double x = strtold (q, NULL);
    printf ("%s\n", fmt_nan (x));

    x = 0;
    x = strtold (q, NULL);
    printf ("%s\n", fmt_nan (x));

    return 0;
  }

  $ gcc -O0 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
  00000000000000c0ff7f400000000000
  00000000000000c0ff7f000000000000
  $ gcc -O1 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out
  00000000000000c0ff7f000000000000
  00000000000000c0ff7f000000000000

-------------------------------------------------------
And here's the patch I expect to apply for now:

>From 17f70721d10f02543ae9e2d4e2c4b2babe0206a7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 27 Sep 2011 16:32:35 +0200
Subject: [PATCH] sort: avoid a NaN-induced infloop

These commands would fail to terminate:
    yes -- -nan | head -156903 | sort -g > /dev/null
    echo nan > F; sort -m -g F F
That can happen with any strtold implementation that includes
uninitialized data in its return value.  The problem arises in the
mergefps function when bubble-sorting the two or more lines, each
from one of the input streams being merged: compare(a,b) returns 64,
yet compare(b,a) also returns a positive value.  With a broken
comparison function like that, the bubble sort never terminates.
Why do the long-double bit strings corresponding to two identical
"nan" strings not compare equal?  Because some parts of the result
are uninitialized and thus depend on the state of the stack.
For more details, see http://bugs.gnu.org/FIXME.
* src/sort.c (nan_compare): New function.
(general_numcompare): Use it rather than bare memcmp.
Reported by Aaron Denney in http://bugs.debian.org/642557.
* NEWS (Bug fixes): Mention it.
* tests/misc/sort-NaN-infloop: New file.
* tests/Makefile.am (TESTS): Add it.
---
 NEWS                        |    4 ++++
 src/sort.c                  |   20 +++++++++++++++++++-
 tests/Makefile.am           |    1 +
 tests/misc/sort-NaN-infloop |   28 ++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 1 deletions(-)
 create mode 100755 tests/misc/sort-NaN-infloop

diff --git a/NEWS b/NEWS
index 140e6fa..f05a088 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,10 @@ GNU coreutils NEWS                                    -*- 
outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Bug fixes
+
+  sort -g no longer infloops for certain inputs containing NaNs
+
 ** Improvements

   md5sum --check now supports the -r format from the corresponding BSD tool.
diff --git a/src/sort.c b/src/sort.c
index 3d3119d..3e94a6e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1910,6 +1910,24 @@ numcompare (char const *a, char const *b)
   return strnumcmp (a, b, decimal_point, thousands_sep);
 }

+/* Work around a problem whereby the long double value returned by glibc's
+   strtold ("NaN", ...) contains uninitialized bits: clear all bytes of
+   A and B before calling strtold.  FIXME: remove this function once
+   gnulib guarantees that strtold's result is always well defined.  */
+static int
+nan_compare (char const *sa, char const *sb)
+{
+  long_double a;
+  memset (&a, 0, sizeof a);
+  a = strtold (sa, NULL);
+
+  long_double b;
+  memset (&b, 0, sizeof b);
+  b = strtold (sb, NULL);
+
+  return memcmp (&a, &b, sizeof a);
+}
+
 static int
 general_numcompare (char const *sa, char const *sb)
 {
@@ -1935,7 +1953,7 @@ general_numcompare (char const *sa, char const *sb)
           : a == b ? 0
           : b == b ? -1
           : a == a ? 1
-          : memcmp (&a, &b, sizeof a));
+          : nan_compare (sa, sb));
 }

 /* Return an integer in 1..12 of the month name MONTH.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index eeb4cab..2cf409a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -250,6 +250,7 @@ TESTS =                                             \
   misc/sort-unique                             \
   misc/sort-unique-segv                                \
   misc/sort-version                            \
+  misc/sort-NaN-infloop                                \
   split/filter                                 \
   split/suffix-length                          \
   split/b-chunk                                        \
diff --git a/tests/misc/sort-NaN-infloop b/tests/misc/sort-NaN-infloop
new file mode 100755
index 0000000..ead871e
--- /dev/null
+++ b/tests/misc/sort-NaN-infloop
@@ -0,0 +1,28 @@
+#!/bin/sh
+# exercise the NaN-infloop bug in coreutils-8.13
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ sort
+
+echo nan > F || fail=1
+printf 'nan\nnan\n' > exp || fail=1
+timeout 10 sort -g -m F F > out || fail=1
+
+compare out exp || fail=1
+
+Exit $fail
--
1.7.7.rc0.362.g5a14





reply via email to

[Prev in Thread] Current Thread [Next in Thread]