[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Makes sort create random order
From: |
Paul Jarc |
Subject: |
Re: [PATCH] Makes sort create random order |
Date: |
Thu, 02 Sep 2004 10:52:44 -0400 |
User-agent: |
Gnus/5.110003 (No Gnus v0.3) Emacs/21.3 (gnu/linux) |
Paul Eggert <address@hidden> wrote:
> Thomas Habets <address@hidden> writes:
>
>>> sort: Add an ordering option -R that causes 'sort' to sort according
>>> to a random permutation of the correct sort order.
>>
>> This means that two different files, that happen to sort to the same output,
>> should give the same output when randomized with the same SEED. Is that
>> right? [*]
>
> Sort of, but not quite.
I couldn't find the "not quite" part of your explanation.
>> Is there a good reason for wanting this?
>
> By "this" do you mean "a fairly-formal definition", or "this
> particular definition of random sorting"? [...] If the latter,
> then because we want sort -R to have the usual properties that
> people expect from "sort", e.g., "sort -rR" should output in the
> reverse order of "sort -R".
Nit: they shouldn't expect that unless they also specify a seed. But
sort -R can still provide this just by permuting the original input
order, rather than the correct sort order. If we have a file A, and
we do:
$ sort -R A > B
$ sort -R --seed=deadbeef A > A1
$ sort -R --seed=deadbeef A > A2
$ sort -R --seed=deadbeef B > B1
$ sort -R --seed=deadbeef B > B2
Then we should expect that A1 and A2 have the same contents, and that
B1 and B2 have the same contents. But the TODO requirement would also
ensure that A1/A2 have the same contents as B1/B2. Is that really
needed?
I'm also not sure that clustering lines with equivalent sort keys is
desirable.
>>> if you sort a permutation of the same input file
>>> with the same --random-seed=SEED option twice, you'll get the same
>>> output. [**]
>>
>> Here however it does not explicitly say what I said above about two different
>> files.
>
> If two files sort to the same output, then they're permutations of
> each other. So [**] implies [*]. (The converse does not hold. See
> what I mean about the logic being tricky here?...)
No, I think [*] implies [**] only. [*] is the more general case
placing a requirement on all permutations of the same input; [**] is
the special case where the two files are the same permutation of the
same input.
paul