[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#10281: [1003.1(2008)/Issue 7 0000527]: du and files found via multip
From: |
Paul Eggert |
Subject: |
bug#10281: [1003.1(2008)/Issue 7 0000527]: du and files found via multiple command line arguments |
Date: |
Sun, 18 Dec 2011 14:03:49 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20111124 Thunderbird/8.0 |
Eric Blake's Option 1 does not appear to be tenable, as du
traditionally preserved hashes of duplicate files across all
of its operands. 7th Edition Unix 'du' did that, and (as
Jilles Tjoelker pointed out) so do at least two current 'du'
implementations, namely, FreeBSD and GNU.
The idea behind Eric's Option 2 is better, but its wording
is unclear partly because of another issue Jilles raised:
whether a file's disk space should be counted multiple times
if the file occurs multiple times and its link count is 1.
For example:
mkdir d
cd d
cp /bin/sh w
cp w y
ln y ../y
ln -s w x
ln -s y z
du -aL
This analyzes a directory with two regular files, 'w' and
'y'. GNU and Solaris du count these files once each, with
an accurate sum of non-symlink disk usage under the current
directory. But w's link count is 1 so FreeBSD counts 'w'
twice, thus overcounting disk usage.
The current POSIX wording does not say what to do for this
example, but the intent is to avoid overcounting disk usage,
and the GNU and Solaris behavior supports this intent better.
(The 7th Edition Unix behavior agrees with FreeBSD, but this
predates symbolic links so the behavior is now dubious.)
Given all the above, the standard's wording could be
improved in several different ways, all elaborations of
Option 2. Here are two possibilities:
Option 2A - require that files be hashed among all
operands, and that disk usage be counted at most once.
Change line 84170 [du DESCRIPTION] from:
Files with multiple links shall be counted and written
for only one entry.
to:
A file that occurs multiple times shall be counted and
written for only one entry, even if the occurrences
are under different file operands.
Option 2B - leave unspecified whether files are hashed
among all operands, and leave unspecified whether disk
usage is counted multiple times for files whose link
count does not exceed 1. From the user's point of view,
this means du's output is a reliable count of disk usage
only if du is invoked without -L and with -x and with at
most one operand.
Change line 84170 [du DESCRIPTION] from:
Files with multiple links shall be counted and written
for only one entry.
to:
A file that occurs multiple times under one file
operand and that has a link count greater than 1 shall
be counted and written for only one entry. It is
implementation-defined whether a file that has a link
count no greater than 1 is counted and written just
once, or is counted and written for each occurrence.
It is implementation-defined whether a file that
occurs under one file operand is counted for other
file operands.
Option 2A is simpler and clearer, but it invalidates many
existing implementations. Option 2B modifies the standard
to describe how existing implementations actually work, but
is more complicated and more of a hassle to use reliably.
Eric raised one other issue: the description of the -a
option implies that "du A B" must always list B. This
implication is incorrect for 7th edition Unix du, GNU du,
and (I expect) FreeBSD du, so it should be fixed as well.
Here's one possible fix, which is independent of the
abovementioned changes.
Change line ????? [du OPTIONS] from:
Regardless of the presence of the -a option,
non-directories given as file operands shall always
be listed.
to:
The -a option does not affect whether
non-directories given as file operands are listed.
(Sorry, I don't know the line number here; I don't have a
PDF copy of the current standard and don't know offhand how
to get one.)