|
From: | Paul Eggert |
Subject: | bug#32099: New uniq option for speed |
Date: | Mon, 9 Jul 2018 09:35:35 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 |
Kingsley G. Morse Jr. wrote:
$ echo -e "b\na\nb" | awk '!seen[$0]++' It basically avoids sorting by using hashed indexes into an associative array to find previously seen values in about O(N) time.
No, it's still O(N log N) because hash table lookup is really O(log N), despite what many textbooks say. Though no doubt we could make it faster than than the sorting pipeline, it wouldn't be algorithmically faster. See, for example:
https://lemire.me/blog/2009/08/18/do-hash-tables-work-in-constant-time/
[Prev in Thread] | Current Thread | [Next in Thread] |