[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12820: FWIW, this is still happening as of gnulib 4a82904
From: |
Nix |
Subject: |
bug#12820: FWIW, this is still happening as of gnulib 4a82904 |
Date: |
Wed, 27 Feb 2013 18:25:44 +0000 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) |
On 26 Feb 2013, Paul Eggert outgrape:
> On 02/26/13 13:48, Nix wrote:
>
>> #2 0x00000000004021e0 in test_utimens (address@hidden, func=0x401890
>> <do_utimensat>) at test-utimens.h:35
>
> If that line number is right, the test program
> did a creat (file, 0600) that succeeded, followed
> by a stat (file, &st) that failed.
Looks like it.
Running test-utimensat in a tight loop shows frequent assertion
failures: running it under high load (e.g. while a 'make -j 9 check' is
running) shows fewer:
No load:
% START=$(date +%s); while [[ $(($(date +%s)-$START)) -lt 10 ]]; do
./test-utimensat; done
test-lutimens.h:162: assertion failed
test-lutimens.h:162: assertion failed
test-utimens.h:103: assertion failed
test-utimens.h:41: assertion failed
test-lutimens.h:162: assertion failed
Parallel make -j 9 check:
% START=$(date +%s); while [[ $(($(date +%s)-$START)) -lt 10 ]]; do
./test-utimensat; done
test-utimens.h:113: assertion failed
test-utimens.h:37: assertion failed
test-utimens.h:130: assertion failed
test-utimens.h:113: assertion failed
test-utimens.h:103: assertion failed
test-lutimens.h:64: assertion failed
test-utimens.h:128: assertion failed
test-utimens.h:103: assertion failed
test-lutimens.h:64: assertion failed
I note that this is not exactly healthy hardware: it is has a slowly
failing fan and is going in for repair tomorrow. However it isn't
overheating yet and *did* just complete a GCC LTO bootstrap-and-test
with no FAILs I wasn't expecting so I don't think the fan failure is
likely to be the cause of these assertions.
As is typical of timing-related faults, running it under strace makes
the frequency of the problems plunge. I've stuck a tarball of all the
distinct failing straces I could get in a few hours of tight loops under
strace up at
<http://www.esperi.org.uk/~nix/bugs/gnulib/utimensat.tar.gz>. (I
truncated the execve lines because dumping my entire process environment
out in public is not something I'm entirely happy doing. Paranoia,
maybe. :) )
Classifying the small set of assertion failures above (a set small
enough that I felt I could attack it without dying of boredom), we see:
test-utimens.h:37: assertion failed
ASSERT (func (BASE "file", NULL) == 0);
test-utimens.h:113: assertion failed
ASSERT (st3.st_atime == Y2K);
test-utimens.h:128: assertion failed
ASSERT (get_stat_mtime_ns (&st3) == get_stat_mtime_ns (&st2));
test-lutimens.h:64: assertion failed
test-lutimens.h:162: assertion failed
test-utimens.h:41: assertion failed
test-utimens.h:103: assertion failed
test-utimens.h:130: assertion failed
ASSERT (ctime_compare (&st1, &st2) < 0);
ASSERT (ctime_compare (&st2, &st3) < 0);
I thought it was worth my analyzing at least the largest set of
failures, this one. This is simply a race. When these assertions are
hit, I see things like this:
st1.st_ctime: 1361977729; st2.st_ctime: 1361977729; compare: 0;
get_stat_ctime_ns(st1): 613474299, get_stat_ctime_ns(st2): 613474299
i.e. the nsec values are identical. This is surely valid: it just means
the two stats were carried out in the same 1ns interval. All these
asserts should be <= 0. (I just saw an identical failure in
test-lchown.h:112.)