bug#32099: New uniq option for speed

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#32099: New uniq option for speed

From:	Paul Eggert
Subject:	bug#32099: New uniq option for speed
Date:	Mon, 9 Jul 2018 09:35:35 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

Kingsley G. Morse Jr. wrote:

     $ echo -e "b\na\nb" | awk '!seen[$0]++'

It basically avoids sorting by using hashed
indexes into an associative array to find
previously seen values in about O(N) time.

No, it's still O(N log N) because hash table lookup is really O(log N), despitewhat many textbooks say. Though no doubt we could make it faster than than thesorting pipeline, it wouldn't be algorithmically faster. See, for example:


https://lemire.me/blog/2009/08/18/do-hash-tables-work-in-constant-time/

[Prev in Thread]

Current Thread

[Next in Thread]

bug#32099: New uniq option for speed, Kingsley G. Morse Jr., 2018/07/08
- bug#32099: New uniq option for speed, Paul Eggert <=
- bug#32099: New uniq option for speed, Assaf Gordon, 2018/07/09
  - bug#32099: Benchmarks: Hashing ~70% faster (Was: New uniq option for speed), Kingsley G. Morse Jr., 2018/07/10
    - bug#32099: Benchmarks: Hashing ~70% faster (Was: New uniq option for speed), Assaf Gordon, 2018/07/10
    - bug#32099: datamash wins! (Was: New uniq option for speed), Kingsley G. Morse Jr., 2018/07/11
    - bug#32099: Benchmarks: Hashing ~70% faster (Was: New uniq option for speed), Bernhard Voelker, 2018/07/12
    - bug#32099: Testing with other options (Was: Benchmarks: Hashing ~70% faster ), Kingsley G. Morse Jr., 2018/07/13

Prev by Date: bug#32101: New uniq option for speed
Next by Date: bug#32099: New uniq option for speed
Previous by thread: bug#32099: New uniq option for speed
Next by thread: bug#32099: New uniq option for speed
Index(es):
- Date
- Thread