bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new coreutil? shuffle - randomize file contents


From: Frederik Eaton
Subject: Re: new coreutil? shuffle - randomize file contents
Date: Sun, 29 May 2005 22:08:54 -0700
User-agent: Mutt/1.5.9i

On Wed, May 25, 2005 at 10:58:41AM +0100, James Youngman wrote:
> On Tue, May 24, 2005 at 09:55:35AM -0700, Paul Eggert wrote:
> 
> > That way, you could use, e.g.:
> > 
> >   sort -k 2,2 -k R
> > 
> > which would mean "sort by the 2nd field, but if there are ties then
> > sort the ties randomly".  "sort -R" would be short for "sort -k R".
> 
> Perhaps this approach avoids the problems that were discussed earlier
> regarding expectations about lines with identical keys "shuffling"
> together.

I hope it is agreed that the conclusion that was reached earlier was
that both behaviors - identical keys shuffling *together* vs. *apart*
- would be useful in different situations. We came up with a number of
situations in which one behavior or the other was necessary, and we
didn't really come up with any other ideas for useful behaviors.

I think we have yet to consider other ways of getting these two
behaviors, however. For instance, "-s" could be seen as an instruction
to "last of all, sort by the input row number". But if we implement
randomization as "sort by hash of keys" - for a "together" shuffle -
then including input row number in this hash would get the contrasting
above behavior that Paul Eggert is suggesting - the "apart" shuffle.
So with a rephrasing of the "-s" option description, it might make
sense for "-R" to indicate the "together" behavior and "-Rs" to
indicate the "apart" behavior. In this case "-s" wouldn't mean
"stable" so much as "depends on input ordering". I don't know if this
is sensible. Anyway, here is the end of the last thread:

http://lists.gnu.org/archive/html/bug-coreutils/2005-02/msg00005.html

Frederik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]