coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: odd test failure: misc/sort-spinlock-abuse


From: Jim Meyering
Subject: Re: odd test failure: misc/sort-spinlock-abuse
Date: Wed, 13 Apr 2011 22:41:19 +0200

Jim Meyering wrote:
> The misc/sort-spinlock-abuse test,
>
>   http://git.sv.gnu.org/cgit/coreutils.git/tree/tests/misc/sort-spinlock-abuse
>
> fails regularly when it is run in parallel with others (make -j25 check,
> ext4+SSD, 6/12-core, F15).  It fails because an output-restrained sort
> (writing to a FIFO with a slow consumer) takes more than 1 second of CPU
> time to process the regular file, "in", created by "seq 100000 > in".
>
> In fact, it may take even more than 4 seconds of CPU time.
> At first I thought it was a regression.  But no:
> the problem arises even when running sort with --parallel=1.
> The original bug involved a parallel-specific busy-wait
> triggered by the blocked output.
>
> What's going on?
> I have traced it back to an fstat syscall that is consuming
> lots of CPU time.
>
> Here's strace -r output, where FD 3 refers to the regular input file, "in":
>
>     3.263544 fstat(3, {st_mode=S_IFREG|0600, st_size=588895, ...}) = 0
>
> When I run this test in isolation, it always completes successfully.
> In that case, the fstat takes 30-40 microseconds.
>
>     make check -C tests TESTS=misc/sort-spinlock-abuse VERBOSE=yes
>
> But when I run it via "make -j25 check", it fails ~40% of the time.
>
> Next step is probably to see if oprofile can shed some light.

[composed a few days ago]

I've just updated to a newer kernel,                2.6.38.2-9.fc15.x86_64.
The one I was using when experiencing the above was 2.6.38.1-6.fc15.x86_64.
Now, the test passes every time:  38 of 38 trials, so far.

Of course, it may not be the kernel, but rather the state of
the system -- it had been running for some time.

As I said: odd.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]