bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6557: du sometimes miscounts directories, and files whose link count


From: Jim Meyering
Subject: bug#6557: du sometimes miscounts directories, and files whose link count equals 1
Date: Sat, 03 Jul 2010 10:36:00 +0200

Jim Meyering wrote:
> Paul Eggert wrote:
>> (I found this bug by code inspection while doing the du performance
>> improvement reported in:
>> http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00014.html
>> )
>>
>> Unless -l is given, du is not supposed to count the same file more
>> than once.  It optimizes this test by not bothering to put a file into
>> the hash table if its link count is 1, or if it is a directory.  But
>> this optimization is not correct if -L is given (because the same
>> link-count-1 file, or directory, can be seen via symbolic links) or if
>> two or more arguments are given (because the same such file can be
>> seen under multiple arguments).  The optimization should be suppressed
>> if -L is given, or if multiple arguments are given.
>>
>> Here is a patch, with a couple of test cases for it.  This patch
>> assumes the du performance fix, but I can prepare an independent
>> patch if you like.
>
> Thanks!
> Actually, that patch applies just fine, as-is.
> However, it induces this new "make check" test failure:
...
> This is the additional patch we'd need to make the failing
> failing test accept your new output.  You're welcome to merge
> it into yours.

Actually I did that.
Here's the adjusted patch, for review.
Note the "du: " prefix on the one-line log summary -- that's
the part that goes into the Subject below.  Plus, I shortened it.
Also, I added a log line for the tests/du/files0-from change.
(BTW, the following is the output from "git format-patch --stdout -1".
It's easy to apply that by saving it in a FILE, then running "git am FILE")

>From efe53cc72b599979ea292754ecfe8abf7c839d22 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Fri, 2 Jul 2010 23:41:08 -0700
Subject: [PATCH] du: don't miscount duplicate directories or link-count-1 files

* NEWS: Mention this.
* src/du.c (hash_all): New static var.
(process_file): Use it.
(main): Set it.
* tests/du/hard-link: Add a couple of test cases to help make
sure this bug stays squashed.
* tests/du/files0-from: Adjust existing tests to reflect
change in semantics with duplicate arguments.
---
 NEWS                 |    5 +++++
 src/du.c             |   15 +++++++++++++--
 tests/du/files0-from |    8 ++++----
 tests/du/hard-link   |   44 ++++++++++++++++++++++++++++++--------------
 4 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/NEWS b/NEWS
index 3a24925..b02a223 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,11 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   Also errors are no longer suppressed for unsupported file types, and
   relative sizes are restricted to supported file types.

+** Bug fixes
+
+  du no longer multiply counts a file that is a directory or whose
+  link count is 1, even if the file is reached multiple times by
+  following symlinks or via multiple arguments.

 * Noteworthy changes in release 8.5 (2010-04-23) [stable]

diff --git a/src/du.c b/src/du.c
index a90568e..4d6e03a 100644
--- a/src/du.c
+++ b/src/du.c
@@ -132,6 +132,9 @@ static bool apparent_size = false;
 /* If true, count each hard link of files with multiple links.  */
 static bool opt_count_all = false;

+/* If true, hash all files to look for hard links.  */
+static bool hash_all;
+
 /* If true, output the NUL byte instead of a newline at the end of each line. 
*/
 static bool opt_nul_terminate_output = false;

@@ -518,8 +521,7 @@ process_file (FTS *fts, FTSENT *ent)
      via a hard link, then don't let it contribute to the sums.  */
   if (skip
       || (!opt_count_all
-          && ! S_ISDIR (sb->st_mode)
-          && 1 < sb->st_nlink
+          && (hash_all || (! S_ISDIR (sb->st_mode) && 1 < sb->st_nlink))
           && ! hash_ins (sb->st_ino, sb->st_dev)))
     {
       /* Note that we must not simply return here.
@@ -937,11 +939,20 @@ main (int argc, char **argv)
                quote (files_from));

       ai = argv_iter_init_stream (stdin);
+
+      /* It's not easy here to count the arguments, so assume the
+         worst.  */
+      hash_all = true;
     }
   else
     {
       char **files = (optind < argc ? argv + optind : cwd_only);
       ai = argv_iter_init_argv (files);
+
+      /* Hash all dev,ino pairs if there are multiple arguments, or if
+         following non-command-line symlinks, because in either case a
+         file with just one hard link might be seen more than once.  */
+      hash_all = (optind + 1 < argc || symlink_deref_bits == FTS_LOGICAL);
     }

   if (!ai)
diff --git a/tests/du/files0-from b/tests/du/files0-from
index 620246d..860fc6a 100755
--- a/tests/du/files0-from
+++ b/tests/du/files0-from
@@ -70,15 +70,15 @@ my @Tests =
     {IN=>{f=>"g\0"}}, {AUX=>{g=>''}},
     {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

-   # two file names, no final NUL
+   # two identical file names, no final NUL
    ['2', '--files0-from=-', '<',
     {IN=>{f=>"g\0g"}}, {AUX=>{g=>''}},
-    {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+    {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

-   # two file names, with final NUL
+   # two identical file names, with final NUL
    ['2a', '--files0-from=-', '<',
     {IN=>{f=>"g\0g\0"}}, {AUX=>{g=>''}},
-    {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+    {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

    # Ensure that $prog processes FILEs following a zero-length name.
    ['zero-len', '--files0-from=-', '<',
diff --git a/tests/du/hard-link b/tests/du/hard-link
index 7e4f51a..e22320b 100755
--- a/tests/du/hard-link
+++ b/tests/du/hard-link
@@ -26,24 +26,40 @@ fi
 . $srcdir/test-lib.sh

 mkdir -p dir/sub
-( cd dir && { echo non-empty > f1; ln f1 f2; echo non-empty > sub/F; } )
-
-
-# Note that for this first test, we transform f1 or f2
-# (whichever name we find first) to f_.  That is necessary because,
-# depending on the type of file system, du could encounter either of those
-# two hard-linked files first, thus listing that one and not the other.
-du -a --exclude=sub dir \
-  | sed 's/^[0-9][0-9]*        //' | sed 's/f[12]/f_/' > out || fail=1
-echo === >> out
-du -a --exclude=sub --count-links dir \
-  | sed 's/^[0-9][0-9]*        //' | sort -r >> out || fail=1
+( cd dir &&
+  { echo non-empty > f1
+    ln f1 f2
+    ln -s f1 f3
+    echo non-empty > sub/F; } )
+
+du -a -L --exclude=sub --count-links dir \
+  | sed 's/^[0-9][0-9]*        //' | sort -r > out || fail=1
+
+# For these tests, transform f1 or f2 or f3 (whichever name is find
+# first) to f_.  That is necessary because, depending on the type of
+# file system, du could encounter any of those linked files first,
+# thus listing that one and not the others.
+for args in '-L' 'dir' '-L dir'
+do
+  echo === >> out
+  du -a --exclude=sub $args dir \
+    | sed 's/^[0-9][0-9]*      //' | sed 's/f[123]/f_/' >> out || fail=1
+done
+
 cat <<\EOF > exp
+dir/f3
+dir/f2
+dir/f1
+dir
+===
 dir/f_
 dir
 ===
-dir/f2
-dir/f1
+dir/f_
+dir/f_
+dir
+===
+dir/f_
 dir
 EOF

--
1.7.2.rc1.192.g262ff





reply via email to

[Prev in Thread] Current Thread [Next in Thread]