coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Command-line program to convert 'human' sizes?


From: Pádraig Brady
Subject: Re: Command-line program to convert 'human' sizes?
Date: Thu, 06 Dec 2012 23:59:50 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

Generally it's best to get git to send email
or send around formats that git can apply directly,
which includes commit messages and references new files etc.
The handiest way to do that is:

  git format-patch --stdout -1 | gzip > numfmt.5.patch.gz

Then it can be applied on a new branch like:

  gzip -dc numfmt.5.patch.gz | git am

I've attached such a patch to this mail that also includes
the following couple of tweaks to NEWS, scripts/git-hooks/commit-msg
and man/.gitignore. Note to enable the updated commit-msg git hook
without having to rerun bootstrap, just:

  cp --backup=numbered scripts/git-hooks/commit-msg .git/hooks


So on to some initial observations...

I noticed This command will core dump:
$ /bin/ls -l | src/numfmt --to-unit=1 --field=5

That's because there are only 2 fields on the first "total" line from ls.
Also I was hit by a similar issue when trying to process the output from df.
I.E. the first line is non numeric and would ideally be skipped.
This is very similar to the --header option to implemented for `join`,
so I'm thinking `numfmt` should support --header too.

OK so avoiding the segfault by stripping the first line,
let's go through examples processing this data:

$ /bin/ls -l | tail -n+2 | head -n3
-rw-rw-r--.  1 padraig padraig   93787 Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig   49630 Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig    3669 Dec  6 22:29 AUTHORS

The following should essentially be a noop with this data,
but notice how the original spacing wasn't taken
into account, and thus the alignment is broken:

$ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to-unit=1 --field=5
-rw-rw-r--.  1 padraig padraig 93787 Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig 49630 Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig 3669 Dec  6 22:29 AUTHORS

With this the alignment is broken as before,
but I also notice the differing width output of each number.

$ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=SI --field=5
-rw-rw-r--.  1 padraig padraig 94k Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig 50k Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig 3.7k Dec  6 22:29 AUTHORS

I expect the algorithm for the above would be to determine the
available width of the field when parsing the first number,
and fit each number to it, but for these auto scaled units,
not to go over a width of 6 for SI or 7 for IEC, so output like:

$ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=SI --field=5
-rw-rw-r--.  1 padraig padraig 94.0K Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig 50.0K Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig  3.7K Dec  6 22:29 AUTHORS

Notice in the above I've used capital K for SI.
I think human() from gnulib may be using k for 1000 and K for 1024.
That's non standard and ambiguous and I see no need to do that.
So for IEC we'd have:

$ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=IEC --field=5
-rw-rw-r--.  1 padraig padraig 92.0Ki Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig 49.0Ki Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig  3.6Ki Dec  6 22:29 AUTHORS

And with a suffix:

$ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=IEC --field=5 --suffix=B
-rw-rw-r--.  1 padraig padraig 92.0KiB Aug 23  2011 ABOUT-NLS
-rw-rw-r--.  1 padraig padraig 49.0KiB Dec  6 22:32 aclocal.m4
-rw-rw-r--.  1 padraig padraig  3.6KiB Dec  6 22:29 AUTHORS

Another thing I thought of there, was it would be
good to be able to parse number formats that it can generate:

$ echo '1,234' | src/numfmt --from=auto
src/numfmt: invalid suffix in input '1,234': ',234'
$ echo '3.7K' | src/numfmt --from=auto
src/numfmt: invalid suffix in input '3.7K': '.7K'

While I said before it would be better to error rather than warn
on parse error, on consideration it's probably best to write a
warning to stderr on parse error, and leave the original number in place.

thanks,
Pádraig.

Attachment: numfmt.5.patch.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]