bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: efficient version of 'sort | uniq -c | sort -n'?


From: Matthew Woehlke
Subject: Re: efficient version of 'sort | uniq -c | sort -n'?
Date: Mon, 21 May 2007 17:52:04 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.4.0

Paul Eggert wrote:
Matthew Woehlke writes:
Is there an efficient implementation of 'sort | uniq -c | sort -n'? I
have a 4 GB core file I want to run 'strings' on, and the above is
really slow.

See Jon Bently, Don Knuth, Doug McIlroy, "Programming pearls: a
literate program", CACM 29, 6 (June 1986), 471-483
<http://doi.acm.org/10.1145/5948.315654>.  Source code is included.  I
think Knuth's solution will run rings around all the solutions
proposed so far, if you tune it right (see below).

Is there a way already in coreutils to do this? If not, would there
be any interest in adding such a method?

I dunno, it sounds pretty specialized.  Though there may be some
interest in a combination "sort | uniq -c", I wouldn't think there'd
be any interest in combining all three.

I would think 'sort --unique-with-count' would be the most valuable, although I'm not sure why you /wouldn't/ want the result sorted by number of occurrences (at least, that's the most obvious use case for the combination I can think of).

I'll put looking at the above-mentioned algorithm on my to-do list, IOW don't expect anything soon, if ever. :-) But thanks for answering the second question!

--
Matthew
When in doubt, duct tape!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]