bug#7597: multi-threaded sort can segfault (unrelated to the sort -u seg

From: Paul Eggert
Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault)
Date: Sun, 12 Dec 2010 13:42:31 -0800
On 12/12/2010 07:41 AM, Jim Meyering wrote:
> That sounds good, assuming it triggers the bug reliably for you.
> I was hoping to find a way to reproduce it without relying on gensort,
> but won't object if you want to do that.

In my attempts to reproduce the problem, it's pretty flaky.
I think it depends on how busy the operating system is.
Sometimes I'd get failures all the time; sometimes, almost
never.  (This was with valgrind; I had much less luck without

Anyway, I pushed this, which seemed to work well enough
on my host.  It prefers gensort if available, but falls
back on seq+shuf if not.

+# Trigger a bug that would cause 'sort' to reference stale thread stack memory.
+# Copyright (C) 2010 Free Software Foundation, Inc.
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# GNU General Public License for more details.
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+# written by Jim Meyering and Paul Eggert
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ sort
+valgrind --help >/dev/null || skip_ "requires valgrind"
+test "$(nproc)" = 1 && skip_ "requires a multi-core system"
+# gensort output seems to trigger the failure more often,
+# so prefer gensort if it is available.
+(gensort -a 10000 in) 2>/dev/null ||
+  seq -f %-98f 10000 | shuf > in ||
+  framework_failure_
+# With the bug, 'sort' would fail under valgrind about half the time,
+# on some circa-2010 multicore Linux platforms.  Run the test 10 times
+# so that the probability of missing the bug should be about 1 in
+# 2**100 on these hosts.
+for i in $(seq 100); do
+  valgrind --quiet --error-exitcode=3 \
+      sort -S 100K --parallel=2 in > /dev/null ||
+    { fail=$?; echo iteration $i failed; Exit $fail; }

