bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coreutils-8.2 misc/ls-time test failure


From: Jim Meyering
Subject: Re: coreutils-8.2 misc/ls-time test failure
Date: Wed, 16 Dec 2009 11:38:13 +0100

Eric Blake wrote:
> [adding bug-gnulib]
>
> According to Eric Blake on 12/15/2009 7:48 PM:
>> According to John Stanley on 12/15/2009 4:42 PM:
>>> Basically, what's happening is that 'touch -a ..' updated ctime in
>>> coreutils-7.6,
>>> but does not update ctime in coreutils-8.2 (hence misc/ls-time fails).
>>
>> Ouch.  That's a bug in the kernel; I can reproduce it:
>>
>> $ uname -a
>> Linux fencepost 2.6.26-2-xen-amd64 #1 SMP Thu Nov 5 04:27:12 UTC 2009
>> x86_64 GNU/Linux
>> $ touch q
>> $ stat -c '%x %z' q
>> 2009-12-15 21:46:33.186677568 -0500 2009-12-15 21:46:33.186677568 -0500
>> $ touch -a q
>> $ stat -c '%x %z' q
>> 2009-12-15 21:47:15.157175384 -0500 2009-12-15 21:46:33.186677568 -0500
>> $
>
> According to strace, coreutils 6.10 used syscall_280 (which I'm assuming
> is utimensat, and that strace is just behind the times compared to the
> kernel); ltrace says it was via:
> futimesat(0, 0, 0x7fff0568c900, 0, 3)            = 0
>
> The newer coreutils likewise uses syscall_280, but via:
>
> futimens(0, 0x7fff5b31a450, 0x60ebd0, 0x7fff5b31a450, 3) = 0
>
> By comparing the results of 'touch f' and 'touch -a f', it appears that
> the kernel ctime bug is only triggered when UTIME_OMIT is passed as one of
> the two timestamps (which is only possible via futimens/utimensat, not
> futimesat).  And that is consistent with the fact that coreutils didn't
> use UTIME_OMIT until coreutils 8.1.
>
> Also, it means that I can probably devise a way to work around the bug in
> gnulib while we wait for the kernel folks to fix their bug.  However,
> there's a question of the minimal number of syscalls needed to fix the
> problem.  It may be that UTIME_NOW also has an impact.  My current idea:
>
> Keep a cache variable that shows whether UTIME_OMIT works (0=unknown,
> 1=yes, -1=no).  If the variable is -1, then treat UTIME_OMIT the same was
> as we do for futimesat (that is, call stat()/gettime() to populate the
> struct timespec prior to making the syscall).  If the variable is 1, then
> the kernel has been fixed.
>
> If the variable is 0, then perform [f]stat both before and after the
> utimensat call; if the times differ, set the cache variable to 1 and we're
> done.  Otherwise, ctime didn't change, so also call gettime().  If gettime
> is within 10 ms of the second stat, the results are inconclusive (given
> that we have proven that some filesystems have a quantization boundary of
> 10 ms where multiple actions within that window all end up with the
> timestamp), so leave the cache at 0, but re-call utimensat anyways with
> the times learned by stat/gettime().  Otherwise, the current time and the
> second ctime differ by more than 10 ms, so utimensat UTIME_OMIT is broken;
> set cache to -1, and fix the problem by re-calling utimensat with the
> times learned by stat/gettime().
>
> Sounds quite hairy.  Any ideas for improvements?

Thanks for investigating and scoping out the solution.
I agree that it sounds hairy, but it also sounds like the required approach.

> And how best to report this bug to the kernel folks?

Posting a minimal demo to lkml should do it.
It'd be good to identify the affected kernel versions
so we can document it and have a chance at someday
removing the work-around code when those kernels
are no longer relevant.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]