bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Human readable sort


From: Michael Speer
Subject: Re: Human readable sort
Date: Sat, 25 Apr 2009 05:12:42 -0400

2009/4/24 Pádraig Brady <address@hidden>:
> Michael Speer wrote:
>> I wrote the following patch to the 7.2 branch of coreutils to allow
>> `sort` to sort by human readable byte sizes.  I looked around a bit to
>> see what the status of previous attempts to integrate this
>> functionality were, but didn't see any very recent activity.  This is
>> my first interaction with coreutils, so if I missed something obvious,
>> please point me towards it.
>>
>> Is the last potential patch (
>> http://www.mail-archive.com/address@hidden/msg14080.html )
>> moving through?  If not, if I cleaned this up ( tabs, documentation,
>> and test cases ) and applied it to the current HEAD on savannah is
>> there a chance of getting this functionality into sort?
>
> Thanks for reviving this again.
> There was a more recent attempt that petered out unfortunately:
> http://www.mail-archive.com/address@hidden/msg14080.html
>
>>
>> Patch assumptions :
>>   * that numbers will use the best representation ( never uses 1024b
>> instead of 1k, etc )
>>   * that the sizes will be specified via suffixes of b, K, M, G, T, P,
>> E, Z, Y or their alternately cased variants
>>
>> The first assumption results in checking only the suffix when they differ.
>> This enables it to match the output of `du -h / du --si`, but possibly
>> not other tools that do not conform to these assumptions.
>
> The consensus was that these assumptions are appropriate and useful.
>
> We assume C99 support now for coreutils so I tweaked your patch,
> the main change being to greatly shrink the lookup table initialisation.
> Note I commented out the lower case letters (except 'k') as I don't
> think any coreutils generate those and they could preclude supporting
> other suffixes in future. I'm not sure about doing that but I think it's
> better to err on the side of too few suffixes than too many?
>

That's much more readable.  I tacked in a size.  The standards do not
reference the lowercase letters you commented out, so I just deleted
them outright.

> Something else to consider is to flag when
> a mixture of SI and IEC units are used, as
> this not being supported might not be obvious
> to users and could cause difficult to debug issues for users.
> I.E. flag an error if the following input is presented.
>  999MB
>  998MiB
> I added a very quick hack for that to the patch for illustration.
>

While du only outputs the first letter, this makes the change better
for more general use.  I added a bounds check, but do not see anything
else beyond your illustration would be needed.

> I also noticed that you didn't terminate the fields before
> processing as was done for the other numeric sorts?
> So I changed that also in the attached patch but didn't
> analyze it TBH.
>

Your change was entirely appropriate.  I should have done that originally.

>
> p.s. obviously docs and help and tests need to be written,
> but we can do that after we get the implementation done.
>

I've attached the updated diff.

Thanks for taking an interest in this.

Michael Speer

Attachment: updated.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]