bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Makes sort create random order


From: Frederik Eaton
Subject: Re: [PATCH] Makes sort create random order
Date: Fri, 28 Jan 2005 17:10:09 -0800
User-agent: Mutt/1.5.6+20040907i

> > You don't get exactly the same behavior, e.g. the final distinction
> > between 3 and 3.0 disappears, but I think this is OK.
> 
> But this would cause the randomization option to not be orthogonal to
> the rest of the options.  The resulting behavior would be hard to
> document, and it would be hard to handle bug reports when it didn't
> behave the way users expected.
> 
> We should have a simple way to explain the proposed behavior, a way
> that works in general.

It doesn't sound complicated to me, and I think few people would care
about this corner case. Can you give me an example where someone would
be randomly numerically sorting different orderings of a list of
numbers printed in different formats with the same seed, and be angry
when he got different results? Let's try to stay on track.

> > c) whether the sorting should depend on the order of the input keys or
> > the keys themselves
> >
> > I want it to depend on the keys themselves.
> 
> No, this issue should be orthogonal to the rest of "sort".  In the
> rest of "sort", input order matters when you specify the -s (--stable)
> option; otherwise it doesn't matter.  Randomness shouldn't change this.

It wouldn't, in my proposal.

> > Why don't you think I want to retain ties? If two keys are equal, and
> > you take their hash, they are still equal.
> 
> But you appended a different random number to each key, right?  So
> they won't be equal after that.

No.

> Perhaps I didn't understand your proposed algorithm.  Did you intend
> to (say) append the same random number to each key, compute an MD5
> checksum of the result, and then sort based on the checksum?

Yes, I'd thought that was clear. (The words "salt" and "random seed"
in my original posting were meant to indicate "same for each key".)

> > As for the nature of the investigations, well, anything for which
> > one needs a random permutation, I suppose. Also, random sampling
> > with sort -R | head, though somewhat inefficient, but convenient,
> 
> But these uses should not attempt to sort ties together.  They should
> attempt to sort them separately.

Hmm, I don't see any of these uses as involving duplicate elements. If
they did, it would become impossible to determine exactly which
elements were sampled, or exactly what your permutation was.

> Perhaps it would be better to think of things in a completely
> different way.  Currently "sort" lets you specify keys that can be any
> contiguous set of input columns.  Perhaps we should extend the syntax
> for keys, so that the option "-k random" means to use an imaginary
> extra column of completely random values.  That would be easy to
> explain to people.  It would also let people decide for themselves
> what to do with ties.

I wouldn't be opposed to having both alternatives. However, I wouldn't
be interested in implementing your suggestion since it sounds more
difficult to implement, and less useful. I like the idea of having the
final ordering be repeatable by specifying the same random seed, which
is a property of my proposal - additionally, the output is robust WRT
to changes in the ordering of the input, and the order of unaffected
output elements doesn't change when input elements are added or
removed, which could be useful in certain cases.

Frederik




reply via email to

[Prev in Thread] Current Thread [Next in Thread]