--- Begin Message ---
Subject: |
Race condition in tests/tail-2/assert.sh |
Date: |
Fri, 11 Sep 2015 18:23:31 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Hello,
We have observed intermittent failures of tests/tail-2/assert.sh
(Coreutils 8.24, libc 2.22), especially showing up on relatively slow
machines (armhf and mips64el.)
The failure is with ‘tail --follow=name’, which, in inotify mode, would
fail to report that file ‘foo’ has been deleted.
The strace of a correct execution (x86_64) is like this:
--8<---------------cut here---------------start------------->8---
lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
inotify_init() = 5
write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3
open("a", O_RDONLY|O_NONBLOCK) = 6
lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210,
f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492,
f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0
close(6) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
open("foo", O_RDONLY|O_NONBLOCK) = 6
lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210,
f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492,
f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0
close(6) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16
open("foo", O_RDONLY|O_NONBLOCK) = -1 ENOENT (No such file or directory)
lstat("foo", 0x7ffee174c3b0) = -1 ENOENT (No such file or directory)
write(2, "tail: ", 6) = 6
write(2, "foo", 3) = 3
write(2, ": No such file or directory", 27) = 27
write(2, "\n", 1) = 1
close(4) = 0
read(5, "\3\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 20) = 16
inotify_rm_watch(5, 3) = -1 EINVAL (Invalid argument)
--8<---------------cut here---------------end--------------->8---
For a *failing* execution (armhf), we get:
--8<---------------cut here---------------start------------->8---
lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
inotify_init() = 5
write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3
open("a", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337,
f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759,
f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337,
f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759,
f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16
open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337,
f_bfree=36178938, f_bavail=34274467, f_files=9502720, f_ffree=9405759,
f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
read(5,
--8<---------------cut here---------------end--------------->8---
In both cases, reading from the inotify file descriptor (number 5)
returns a notification for watch #3 (corresponding to ‘foo’), with mask
IN_ATTRIB (value 4).
However, the open("foo") call that immediately follows does *not* return
ENOENT in the failing case: The file is still there.
The kernel’s ‘vfs_unlink’ goes like this:
fsnotify_link_count(target); /* IN_ATTRIB */
d_delete(dentry); /* fsnotify_nameremove → IN_DELETE for “.” */
So, ‘tail’ first receives the IN_ATTRIB notification corresponding to
the link count decrease on ‘foo’; at that point, ‘foo’ is still
available. And then, ‘tail’ should receive the IN_DELETE_SELF
notification, at which point ‘foo’ would have been actually been
unlinked. But in practice we don’t seem to be receiving IN_DELETE_SELF,
even in the succeeding case.
I think the problem happens when ‘tail’ opens ‘foo’ right in between of
the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
report anything.
Does that make sense?
Thanks,
Ludo’.
--- End Message ---