bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#32099: New uniq option for speed


From: Kingsley G. Morse Jr.
Subject: bug#32099: New uniq option for speed
Date: Sun, 8 Jul 2018 14:03:56 -0700
User-agent: NeoMutt/20170306 (1.8.0)

Thank you very much for maintaining coreutils!

It seems to me that it would be hard to
underestimate how useful it is.


The main reason I'm writing is to humbly suggest
a performance improvement to the "uniq" command.

It currently requires its input to be sorted.

Sorting generally takes about O(N*log(N)) time.

However, it seems to me that sorting is
unnecessary.

You can see this is so in the following example

    $ echo -e "b\na\nb" | awk '!seen[$0]++'

It basically avoids sorting by using hashed
indexes into an associative array to find
previously seen values in about O(N) time.

I expect it would be about O(log(N)) times faster
than the sorting the uniq command currently
requires.

For big values of N, the time saved could be
substantial.


Humble suggestion.

Add an option to the "uniq" command.

Let it work quickly with unsorted data.

Maybe the new option could be something like
"-usi" for "unsorted input".

Thanks,
Kingsley

-- 
Time is the fire in which we all burn.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]