bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure


From: Ludovic Courtès
Subject: ‘unlinkat’ bug in Linux 4.0.2 leads to tar test failure
Date: Sun, 24 May 2015 13:33:49 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

(Please keep address@hidden Cc'd.)
(Gnulib: please scroll further down for the ‘unlinkat’ issue.)

Andy Patterson <address@hidden> skribis:

> > I suppose this is Guix 0.8.2 on top of another distribution, right?  Did
> > you install from source or from the binary tarball?  Did you enable
> > substitutes (info "(guix) Substitutes")?
> 
> I was using the USB install medium in a live environment.

So this is on GuixSD 0.8.2.  ‘test-suite.log’ indeed mentions
Linux-libre 4.0.2.

> I had substitutes enabled (I'm pretty sure they're enabled by default
> here, but I also enabled them manually just to be sure). I wasn't able
> to install anything with substitutes enabled; it would always stall
> while trying to update the substitutes list from hydra. When my
> network went down briefly, it informed me that it was still at 0.0%
> before exiting. I think that this is probably a separate issue, but
> which which I was less concerned about since I didn't want to use
> substitutes anyway.

OK.

hydra.gnu.org is unfortunately too often overloaded these days, so you
probably arrived on a bad day.  Nevertheless, the solution to this
specific issue is for you to use substitutes to circumvent the bug
described below.

>> Does the build succeed if you run it another time with:
>>
>>   guix build tar -K -c 1
>
> I tried this (with --no-substitutes), but I don't think the test suite
> actually runs in parallel. I didn't notice any difference in that regard
> when it was running; it seemed to take up the same amount of time with
> or without -c 1. I had the same tests fail with the flag enabled.

Oh you must be right.  Looking at tests/Makefile.in, I see:

--8<---------------cut here---------------start------------->8---
check-local: atconfig atlocal $(TESTSUITE)
        $(SHELL) $(TESTSUITE) $(TESTSUITEFLAGS)
--8<---------------cut here---------------end--------------->8---

... which shows that ./testsuite is not automatically passed -j,
contrary to what I thought.

<http://lists.gnu.org/archive/html/bug-tar/2014-08/msg00010.html>
reports a similar issue but on a different OS.

I just tried this in a GuixSD VM with Linux-libre 4.0.2:

--8<---------------cut here---------------start------------->8---
  mkdir foo
  mkdir bar
  echo foo/foo_file > foo/foo_file
  echo bar/bar_file > bar/bar_file
  tar -cvf foo.tar --remove-files -C foo . -C ../bar .
  find .
  stat bar
--8<---------------cut here---------------end--------------->8---

And indeed, it fails (that is, ‘bar’ is left behind.)  It works fine on
4.0.4-gnu though.

On 4.0.2-gnu, I strace’d the ‘tar’ command above:

--8<---------------cut here---------------start------------->8---
openat(AT_FDCWD, "foo", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4

[...]

openat(4, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 5

[...]

openat(5, "foo_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6

[...]

openat(4, "../bar", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
newfstatat(5, ".", {st_mode=S_IFDIR|0755, st_size=60, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
openat(5, ".", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 6

[...]

openat(6, "bar_file", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
write(1, "./bar_file\n", 11)            = 11
read(7, "x\n", 2)                       = 2
fstat(7, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
close(7)                                = 0
fstat(6, {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
brk(0x1a34000)                          = 0x1a34000
close(6)                                = 0
write(3, "./\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
10240) = 10240
close(3)                                = 0
unlinkat(4, "foo_file", 0)              = 0
unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
unlinkat(5, "bar_file", 0)              = 0
unlinkat(4, "../bar", AT_REMOVEDIR)     = -1 ENOENT (No such file or directory)
--8<---------------cut here---------------end--------------->8---

Contrast this with the same thing on 4.0.4-gnu:

--8<---------------cut here---------------start------------->8---
unlinkat(4, "foo_file", 0)              = 0
unlinkat(AT_FDCWD, "foo", AT_REMOVEDIR) = 0
unlinkat(5, "bar_file", 0)              = 0
unlinkat(4, "../bar", AT_REMOVEDIR)     = 0
--8<---------------cut here---------------end--------------->8---

So this looks like a 4.0.2 kernel bug that Gnulib’s unlinkat should
perhaps work around.

Thoughts?

Thanks,
Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]