bug#7489: [coreutils] over aggressive threads in sort

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7489: [coreutils] over aggressive threads in sort

From:	Jim Meyering
Subject:	bug#7489: [coreutils] over aggressive threads in sort
Date:	Sun, 05 Dec 2010 12:21:01 +0100

Paul Eggert wrote:
> On 11/29/2010 02:46 PM, Paul Eggert wrote:
>> My current guess, by the way,
>> is that it's not a bug that can be triggered: it's merely
>> useless code that is harmless and can safely be removed.
>
> I removed it as part of the following series of cleanup
> patches.  These are intended merely to refactor the code
> and simplify it a bit, to make it easier to fix the CPU
> spinlock bug.  Please feel free to undo anything that
> looks at all questionable.

Hi Paul,

Thanks for all the clean-up.

I have no idea if the following is as a result of your changes,
since the segfault failure has been hard to reproduce.
It is from the sort-compress test, and has happened so far
only twice during "make -j9 check" on a quad-core F14 system:

    Core was generated by `sort --compress-program=./dzip -S 1k in'.
    Program terminated with signal 11, Segmentation fault.
    #0  queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322
    3322      if (! node->queued)
    (gdb) p node
    $1 = (struct merge_node *) 0x5
    (gdb) bt
    #0  queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322
    #1  0x00000000004055a9 in queue_check_insert_parent (
        lines=<value optimized out>, dest=<value optimized out>,
        nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
        lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
        temp_output=0x1c2f72c "./sortpns55x") at sort.c:3340
    #2  merge_loop (lines=<value optimized out>, dest=<value optimized out>,
        nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
        lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
        temp_output=0x1c2f72c "./sortpns55x") at sort.c:3374
    #3  sortlines (lines=<value optimized out>, dest=<value optimized out>,
        nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
        lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
        temp_output=0x1c2f72c "./sortpns55x") at sort.c:3515
    #4  0x00000000004059cb in sortlines_thread (data=<value optimized out>)
        at sort.c:3428
    #5  0x0000003f49806d5b in start_thread () from /lib64/libpthread-2.12.90.so
    #6  0x0000003f48ce4aad in clone () from /lib64/libc-2.12.90.so

However, there is another failure that makes me suspicious:
(also based on sort-compress):

  seq -w 200000 > exp && tac exp > in
  PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out

That gets stuck in waitpid (from sort.c's reap), waiting for a
dzip invocation that appears will never terminate.  This is also
on that same 4-core system, and is relatively easy to reproduce,
so it should be easy to identify the offending change, but I'm
out of time, now.

The hang is also reproducible with just 2000 input lines,
but then it doesn't arise as consistently.

I'll note in passing that the spinlock CPU utilization problem
is particularly noticeable when using --compress-program= because
there is a lot more waiting.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault), (continued)
- bug#7489: [coreutils] over aggressive threads in sort, Paul Eggert, 2010/12/04
  - bug#7489: [coreutils] over aggressive threads in sort, Jim Meyering <=
    - bug#7489: [coreutils] over aggressive threads in sort, Paul Eggert, 2010/12/06

Prev by Date: bug#7529: Bug#605639: deal better with different filesystem timestamp resolutions
Next by Date: bug#6216: Feature request: Split -p
Previous by thread: bug#7489: [coreutils] over aggressive threads in sort
Next by thread: bug#7489: [coreutils] over aggressive threads in sort
Index(es):
- Date
- Thread