[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: potential feature addition to coreutils' sort.c: print at most N lin
From: |
Pádraig Brady |
Subject: |
Re: potential feature addition to coreutils' sort.c: print at most N lines |
Date: |
Mon, 04 Mar 2013 00:47:33 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/03/2013 05:32 PM, James Dowdell wrote:
> I'm considering writing a patch for sort.c to add a new feature, related to a
> stackoverflow inquiry I wrote
> (http://stackoverflow.com/questions/14882897/what-standard-commands-can-i-use-to-print-just-the-first-few-lines-of-sorted-out).
>
> This would be my first patch, and this is my first time messaging a gnu list;
> apologies if I'm "doing it wrong."
>
> I use GNU sort a lot, and routinely find myself in the situation of
> executing, e.g.:
>
> $ sort ... | head -n 1000
>
> This can be very unnecessarily slow when the input is huge, because sort does
> a lot of work that head throws away.
>
> I propose a new parameter, "-H, --head=NLINES", which has sort only print at
> most NLINES of output. More than just a filter at the end like | head, it
> would avoid unnecessary sorting on more than NLINES of output.
>
> I want to know the procedure for submitting a patch, and the likelihood that
> such a patch would even be considered, before I spend time to parse the whole
> sort.c file and propose a complete and rigorous solution (which would be
> analogous to submitting the patch). From a quick glance at the source, my
> current strategy would be to alter the merge nodes when this parameter is set
> so that the number of lines listed per node is clamped to NLINES. While less
> efficient than an ideal solution, it would be more efficient than what's
> currently in place, and has the benefits of minimal code edits and negligible
> negative performance impact on mainstream use when the parameter is not
> passed.
>
> All feedback welcome, thank you.
There is general agreement that this is worthwhile.
Please read these first:
http://lists.gnu.org/archive/html/bug-coreutils/2004-04/msg00157.html
http://lists.gnu.org/archive/html/bug-coreutils/2009-07/msg00019.html
As for contributing the patch, it would be much appreciated.
For contribution details, please see the HACKING file:
http://git.sv.gnu.org/cgit/coreutils.git/plain/HACKING
In summary you would submit a patch against the latest git tree,
to address@hidden. Also for a patch of this significance,
you would need to follow the copyright assignment procedure.
thanks!
Pádraig.