[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: potential feature addition to coreutils' sort.c: print at most N lin
From: |
Pádraig Brady |
Subject: |
Re: potential feature addition to coreutils' sort.c: print at most N lines |
Date: |
Wed, 06 Mar 2013 09:24:25 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/06/2013 01:26 AM, James Dowdell wrote:
> Pádraig,
>
> Thanks for the pointers. I am slowly making my way through sort.c,
> and preparing a patch; I estimate I'll have something ready in a few
> days or weeks.
>
> One final question before I go make this thing, regarding the thread
> conversation you included from 2004, in which Philippe De Muyter
> suggested he already had a functional patch involving a heap. Is
> there a code change I should be starting from, or has that been lost /
> never submitted?
I can't see it submitted.
At least there are performance numbers to compare against.
> Originally I also thought that the 'optimal' solution would involve
> heaps in some way, as everyone in that thread suspected, but I've
> surprised myself by discovering an n-log-k algorithm based off of
> merge sort, which I think would be much easier to dovetail with the
> existing code in sort.c and may actually work better too. If there is
> no patch to begin from, I'll be preparing my patch based on that
> unless someone objects.
n-log-k sounds good.
Is that dependent on the input data?
Is it worth special casing for performance,
when only getting a single top/bottom entry?
Have you started the copyright assignment process in parallel?
It like Jim's suggestion of a single --range option,
rather than the more traditional separated --head and --tail.
There are arguments for calling this --slice, to align
with javascript/php/python/..., but that's just minor renaming.
thanks!
Pádraig.