coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Support for --size in du


From: Bernhard Voelker
Subject: Re: [PATCH] Support for --size in du
Date: Thu, 17 Jan 2013 08:19:37 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130105 Thunderbird/17.0.2

On 01/17/2013 02:46 AM, Pádraig Brady wrote:
> On 01/17/2013 01:23 AM, Bernhard Voelker wrote:
>> I was pretty sure that this slipped also from Padraig's list.
> 
> Sorry for the delay in this.
> 
> Note it's still on the list:
> http://www.pixelbeat.org/patches/coreutils/inbox_dec_2012.html
> 
> You can browse older news and subscribe to new updates at:
> http://www.pixelbeat.org/patches/coreutils/

Thanks for the links.

>> Therefore, I took Jakob's patch and amended it with documentation
>> and a comprehensive test. ;-)
> 
> Wow great work on the test.

Well, that test just grew and grew. It's actually a result of
me not being 100% happy with the --size option as in some
situations it might confuse people more than it may help:

E.g. users usually tend to "think in apparent sizes" for their
files instead of block sizes.

Having a directory like this:

  $ find tmp -exec ls -dog '{}' +
  drwxr-xr-x 5      4096 Jan 17 07:28 tmp
  drwxr-xr-x 2      4096 Jan 17 07:29 tmp/big_dir
  -rw-r--r-- 1 104857600 Jan 17 07:29 tmp/big_dir/big_file
  drwxr-xr-x 2      4096 Jan 17 07:25 tmp/empty_dir
  drwxr-xr-x 2      4096 Jan 17 07:28 tmp/small_dir
  -rw-r--r-- 1         6 Jan 17 07:26 tmp/small_dir/small_file
  -rw-r--r-- 1         0 Jan 17 07:22 tmp/x0
  -rw-r--r-- 1         1 Jan 17 07:22 tmp/x1
  -rw-r--r-- 1        10 Jan 17 07:22 tmp/x2
  -rw-r--r-- 1       100 Jan 17 07:22 tmp/x3
  -rw-r--r-- 1      1000 Jan 17 07:22 tmp/x4
  -rw-r--r-- 1     10000 Jan 17 07:22 tmp/x5
  -rw-r--r-- 1    100000 Jan 17 07:22 tmp/x6
  -rw-r--r-- 1   1000000 Jan 17 07:22 tmp/x7

Then filter files and directories greater/equal 4000:

  $ src/du -B1 -a --size=4000 tmp | sort -k2
  106012672  tmp
  104861696  tmp/big_dir
  104857600  tmp/big_dir/big_file
  4096       tmp/empty_dir
  8192       tmp/small_dir
  4096       tmp/small_dir/small_file
  4096       tmp/x1
  4096       tmp/x2
  4096       tmp/x3
  4096       tmp/x4
  12288      tmp/x5
  102400     tmp/x6
  1003520    tmp/x7

This included also the small files tmp/x1 while it left out
the empty file tmp/x0 ... but yet included the empty directory
tmp/empty_dir. This feels somehow counter-intuitive.

Now let's use the "apparent size":
  $ src/du -B1 -a --size=4000 --app tmp | sort -k2
  105985101 tmp
  104861696 tmp/big_dir
  104857600 tmp/big_dir/big_file
  4096      tmp/empty_dir
  4102      tmp/small_dir
  10000     tmp/x5
  100000    tmp/x6
  1000000   tmp/x7

This is much better. Well, the empty directory still shows up
here (which might be different on a different file system),
but at least the small files have gone.

Thus said, it seems that automatically applying --apparent
when -a and --size is specified would give a more "natural"
result.

In practice, the users will probably only search for huge files
and directories, i.e. much greater than the file system's
block size, but even then they'd be trapped by forgetting the
--app option when it comes to sparse files:

  $ src/truncate --size=1T tmp/sparse-1T

  $ src/du -h -a --size=100M tmp
  100M    tmp/big_dir/big_file
  101M    tmp/big_dir
  102M    tmp

  $ src/du -h -a --size=100M --app tmp
  100M    tmp/big_dir/big_file
  101M    tmp/big_dir
  1.0T    tmp/sparse-1T
  1.1T    tmp

The only way out of this - probably only my - confusion would
be to prevent the use of the -a and the --size option together.
But this would artificially restrict the user's flexibility.

Does anyone else have such a feeling, too?


> I wonder would it make sense to have consistent --size
> handling for du and truncate. I.E. have --size='<10M'
> specify the max size and --size='>10M' specify the min size?

I personally do not like shell-special characters in optargs
too much, as many users will forget to put it into quotes;
--size=<10M may not be a great problem, but --size=>10M
may destroy data.

I was rather thinking that to make it more consistent with
"find tmp -size +10M", or even to teach find a new -csize
(cumulative size) option ... as finding big directories was
the original problem. On the other side, 'find' doesn't offer
the flexibility to filter based on the block size, i.e. it
would always include huge sparse files although these do
not fill up the file system.

Maybe the current implementation is still the better way ...

Have a nice day,
Berny

P.S. Thanks for reading down here. ;-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]