bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10281: [1003.1(2008)/Issue 7 0000527]: du and files found via multip


From: Paul Eggert
Subject: bug#10281: [1003.1(2008)/Issue 7 0000527]: du and files found via multiple command line arguments
Date: Sun, 18 Dec 2011 14:03:49 -0800
User-agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20111124 Thunderbird/8.0

Eric Blake's Option 1 does not appear to be tenable, as du
traditionally preserved hashes of duplicate files across all
of its operands.  7th Edition Unix 'du' did that, and (as
Jilles Tjoelker pointed out) so do at least two current 'du'
implementations, namely, FreeBSD and GNU.

The idea behind Eric's Option 2 is better, but its wording
is unclear partly because of another issue Jilles raised:
whether a file's disk space should be counted multiple times
if the file occurs multiple times and its link count is 1.
For example:

  mkdir d
  cd d
  cp /bin/sh w
  cp w y
  ln y ../y
  ln -s w x
  ln -s y z
  du -aL

This analyzes a directory with two regular files, 'w' and
'y'.  GNU and Solaris du count these files once each, with
an accurate sum of non-symlink disk usage under the current
directory.  But w's link count is 1 so FreeBSD counts 'w'
twice, thus overcounting disk usage.

The current POSIX wording does not say what to do for this
example, but the intent is to avoid overcounting disk usage,
and the GNU and Solaris behavior supports this intent better.
(The 7th Edition Unix behavior agrees with FreeBSD, but this
predates symbolic links so the behavior is now dubious.)

Given all the above, the standard's wording could be
improved in several different ways, all elaborations of
Option 2.  Here are two possibilities:

  Option 2A - require that files be hashed among all
  operands, and that disk usage be counted at most once.

    Change line 84170 [du DESCRIPTION] from:

      Files with multiple links shall be counted and written
      for only one entry.

    to:

      A file that occurs multiple times shall be counted and
      written for only one entry, even if the occurrences
      are under different file operands.

  Option 2B - leave unspecified whether files are hashed
  among all operands, and leave unspecified whether disk
  usage is counted multiple times for files whose link
  count does not exceed 1.  From the user's point of view,
  this means du's output is a reliable count of disk usage
  only if du is invoked without -L and with -x and with at
  most one operand.

    Change line 84170 [du DESCRIPTION] from:

      Files with multiple links shall be counted and written
      for only one entry.

    to:

      A file that occurs multiple times under one file
      operand and that has a link count greater than 1 shall
      be counted and written for only one entry.  It is
      implementation-defined whether a file that has a link
      count no greater than 1 is counted and written just
      once, or is counted and written for each occurrence.
      It is implementation-defined whether a file that
      occurs under one file operand is counted for other
      file operands.

Option 2A is simpler and clearer, but it invalidates many
existing implementations.  Option 2B modifies the standard
to describe how existing implementations actually work, but
is more complicated and more of a hassle to use reliably.

Eric raised one other issue: the description of the -a
option implies that "du A B" must always list B.  This
implication is incorrect for 7th edition Unix du, GNU du,
and (I expect) FreeBSD du, so it should be fixed as well.
Here's one possible fix, which is independent of the
abovementioned changes.

  Change line ????? [du OPTIONS] from:

    Regardless of the presence of the -a option,
    non-directories given as file operands shall always
    be listed.

  to:

    The -a option does not affect whether
    non-directories given as file operands are listed.

(Sorry, I don't know the line number here; I don't have a
PDF copy of the current standard and don't know offhand how
to get one.)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]