Re: [coreutils] added ability in sort to skip n number of lines for each

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] added ability in sort to skip n number of lines for each

From:	Jim Hester
Subject:	Re: [coreutils] added ability in sort to skip n number of lines for each file
Date:	Tue, 23 Nov 2010 10:57:46 -0500

Below I have an updated proper patch, it is quite a bit larger than my first, but should address all of the concerns from Assaf and Pádraig.

My main motivation here is not just to make this common operation less annoying, it was mostly for increased performance. I made a test dataset of 10 files with 3 header lines each and 500,000 lines to sort, then ran sort by using head and tail as Pádraig suggests, and then again using my implemented header skip on an 8 core machine. Larger files seem to show similar speed up as well. I believe this speedup comes from the fact that the multithreaded sort is trying to read from the buffer faster than tail can write to the buffer.

>time { (head -q -n 3 test[0-9] | head -n 3; tail -q -n+4 test[0-9] | ./sort -n ) > out2; }

real    0m51.660s
user    2m0.324s
sys     0m4.115s

>time ./sort -n -l 3 test[0-9] > out

real    0m31.834s
user    2m17.775s
sys     0m3.981s
>diff out out2
>

2010/11/22 Pádraig Brady <address@hidden>

On 22/11/10 22:21, Pádraig Brady wrote:
> Perhaps something like:
>
> (head --no-header -n1 file.* | head -n1; tail --no-header -n+2 file.* | sort)
>
> I.E. add the --no-header option to suppress the ==> file name <== annotations
> which would allow using `head` and `tail` in general for this.

Of course this being useful, it's already supported:

(head -q -n1 file.* | head -n1; tail -q -n+2 file.* | sort)

cheers,
Pádraig

sort_skip_lines_2.diff
Description: Binary data

[Prev in Thread]

Current Thread

[Next in Thread]

[coreutils] added ability in sort to skip n number of lines for each file, Jim Hester, 2010/11/18
- Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
  - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Jim Hester <=
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/23
- Re: [coreutils] added ability in sort to skip n number of lines for each file, Assaf Gordon, 2010/11/22
  - Re: [coreutils] added ability in sort to skip n number of lines for each file, Assaf Gordon, 2010/11/22

Prev by Date: Re: [coreutils] coredump segmentation fault using coreutils 6.4 sparc solaris using mv or touch
Next by Date: RE: [coreutils] coredump segmentation fault using coreutils 6.4 sparc solaris using mv or touch
Previous by thread: Re: [coreutils] added ability in sort to skip n number of lines for each file
Next by thread: Re: [coreutils] added ability in sort to skip n number of lines for each file
Index(es):
- Date
- Thread