bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Frederik Eaton
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Tue, 24 May 2005 08:33:23 -0700
User-agent: Mutt/1.5.6+20040907i

On Tue, May 24, 2005 at 11:25:48AM +0100, address@hidden wrote:
> James Youngman wrote:
> >Davis Houlton writes:-
> >
> >
> >
> >>I recently had to write a shuffle utility for a personal project and
> >>was wondering if it would make a canidate for the coreutils
> >>suite. It seems like the kind of utility the toolbox could use
> >>(maybe under section 3. Output of entire files).
> >
> >
> >This behaviour was proposed a few months ago as a new option to
> >"sort", and there were objections around the ideas of keeping the
> >shuffled sort stable (i.e. that lines with the same key should appear
> >in groups in the shuffled output) and of repeatability (e.g. giving a
> >'random seed' to ensure output is reproducible[*]).  Much discussion
> >followed and eneded up with many people agreeing that this behaviour
> >properly belonged in a a separate program.

I didn't get that impression. There was some resistance to the idea,
but no conclusion, except that two variants would be needed if the
functionality were added to 'sort'. See the thread starting on
2005/1/29.

I think that since people keep asking for this feature, and offering
to implement it, there is no reason not to make a commitment to adding
it. If you don't want to use it, you can easily avoid using it.

> >So, I think that "shuffle" is a good idea.
> 
> I don't agree. You just end up duplicating
> 99% of the sort logic. Logically the only difference from sort
> is the low level ordering algorithm. so I vote for and extra arg to 
> sort: --sort="random". Another arg to the --sort option could be,
> "version" which would sort files with version numbers in their name
> appropriately.

I don't know about the "version" suggestion, at this point it seems
that you would be better off using perl. However, it does seem that
the 'sort' command line API could be extended to allow a great variety
of special sorting functions to be specified, with little burden on
the documentation and option syntax, by specifying the function name
as an argument to a general purpose option as with the "--sort XX" you
propose.

As for the duplication of sort logic, yes, I think this is a good
reason to expand the functionality of 'sort'. But in my opinion a more
important reason is that the set of commands that one runs on a unix
system comprise a language, which is a very important language from a
user's perspective, and if people think that it should make sense
intuitively to use a certain command to do a certain thing, then that
command should be extended to do that thing, for the sake of the
conciseness and clarity and usability of this language. The fact that
something can be accomplished in another way means nothing, if that
other way is cumbersome or unintuitive. Think about the big picture:
if somebody is arguing that there should be a standard well-known
'shuffle' command which is part of most distributions, then I would
say, well, adding such a feature to 'sort' would be a much simpler
solution - it would be on the whole easier to maintain, and on the
whole easier to remember.

Frederik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]