[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Multithreaded sort hangs on Solaris
From: |
Pádraig Brady |
Subject: |
Re: Multithreaded sort hangs on Solaris |
Date: |
Tue, 12 Mar 2013 11:06:59 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/11/2013 03:47 PM, McFarland, Jeffrey wrote:
> I have come across some odd results regarding the sort utility in coreutils
> version 8.20. I’ve looked through the archives and don’t see any similar
> issues so it may be something specific to our systems.
>
>
>
> System: SunOS 5.10 Generic_147440-26 sun4u sparc SUNW,Sun-Fire-V890
>
>
>
> Issue: When running sort on a 22.5 GB file I found that about 1 out of 10
> times the process seems to hang (out of 100+ tests). The process is still
> running but the temp files are no longer changing and the final file either
> has not been created or is a 0 byte file. When this happens the temp files
> are never in the exact same state as a previous run. On this machine a
> complete sort normally takes about 20 minutes. On one occasion the process
> hung for over 48 hours before I killed it. Running top shows no significant
> load on the system.
>
>
>
> Command run:
>
> ./sort -t\n -S 256M --batch-size=100 -T /disk/craiwk01/prod/SORTWK -T
> /disk/craiwk02/prod/SORTWK -T /disk/craiwk03/prod/SORTWK -T
> /disk/craiwk04/prod/SORTWK -T /disk/craiwk06/prod/SORTWK -k1.1,1.10 infile -o
> infile.sorted
>
>
>
>>: ps
>
> PID TTY TIME CMD
>
> 16328 pts/3 5:06 sort
>
> 12697 pts/3 0:00 ps
>
>
>
>>: sudo truss -rall -wall -f -p 16328
>
> 16328: lwp_park(0x00000000, 0) (sleeping...)
>
>
>
>>: sudo pstack 16328
>
> 16328: /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/
>
> ----------------- lwp# 1 / thread# 1 --------------------
>
> ffffffff7d4d8818 lwp_park (0, 0, 0)
>
> 0000000100009c74 sortlines (111b56580, 111c56080, ffffffff7fffeab0,
> 10012a321, ffffffff7fffead0, 10012a328) + 514
>
> 000000010000a5cc sortlines (111558380, 2, ffffffff7fffeab0, 1121765e0, 0,
> ffffffff7fffeab0) + e6c
>
> 000000010000a5cc sortlines (111956f80, 4, ffffffff7fffeab0, 112176420, 0,
> ffffffff7fffeab0) + e6c
>
> 000000010000a5cc sortlines (112154760, 8, ffffffff7fffeab0, 1121760a0, 1,
> ffffffff7fffeab0) + e6c
>
> 000000010000c070 sort (10012a740, 0, ffffffff7fffead0, 23, 10012cddd,
> 112154760) + 350
>
> 000000010000e6e8 main (13, ffffffff7ffff148, 0, 10012c220, fffd, 10012b1e0) +
> 1ee8
>
> 00000001000041bc _start (0, 0, 0, 0, 0, 0) + 7c
>
> ----------------- lwp# 240 / thread# 240 --------------------
>
> 000000010000a600 sortlines_thread(), exit value = 0x0000000000000000
>
> ** zombie (exited, not detached, not yet joined) **
>
> ----------------- lwp# 241 / thread# 241 --------------------
>
> 000000010000a600 sortlines_thread(), exit value = 0x0000000000000000
>
> ** zombie (exited, not detached, not yet joined) **
>
> ----------------- lwp# 242 / thread# 242 --------------------
>
> 000000010000a600 sortlines_thread(), exit value = 0x0000000000000000
>
> ** zombie (exited, not detached, not yet joined) **
>
>
>
> If I change the sort to run as a single threaded process (add “--parallel=1”
> to above command) then it doesn’t hang. This makes me think that it’s most
> likely a threading issue. I ran the same tests on a LINUX machine and it did
> not have the same hanging issue so it’s most likely only an issue with
> Solaris.
>
>
>
> I initially found this issue using coreutils 8.9 and I changed to 8.20 to see
> if a fix had been made but no luck.
>
>
>
> Is this a known issue? Are there any additional tests I should run to
> further narrow down this issue?
I can't think of anything TBH.
There may possibly be some portability issues with --compress and --parallel
(due to possibly non async safe functions being called after a fork),
but you're not using --compress, so we can discount that at least.
No matter if the bug is in coreutils or solaris,
adding some sleeps may help trigger a race more quickly?
BTW the `sort -t\n` looks strange. Did you mean: sort -t$'\n' ?
thanks,
Pádraig.