|
From: | Kaz Kylheku (Coreutils) |
Subject: | Re: how to speed up sort for partially sorted input? |
Date: | Tue, 10 Aug 2021 22:23:31 -0700 |
User-agent: | Roundcube Webmail/0.9.2 |
On 2021-08-10 22:06, Kaz Kylheku (Coreutils) wrote:
On 2021-08-07 17:46, Peng Yu wrote:Hi, Suppose that I want to sort an input by column 1 and column 2 (column 1 is of a higher priority than column 2). The input is already sorted by column1. Is there a way to speed up the sort (compared with not knowing column 1 is already sorted)? Thanks.Since you know that colum 1 is sorted, it means that a sequential scan of the data will reveal chunks that have the same colum1 value. You just have to read and separate these chunks, and sort each one individually by column 2.GNU Awk has the wherewithal for this sort of thing; it has some facilitiesfor sorting associative arrays.
TXR Lisp: structure + function + awk macro: (defstruct rec () f1 rec) ;; sort list of records by slot f1, and then ;; dump their rec slots in order via put-line. (defun dump (data) (mapdo [chain .rec put-line] (nsort data : .f1))) (awk ;; two local variables (:let f0-prev data) ;; if field zero is not same as f0-prev, sort and dump data, ;; then set data to nil again ((nequal [f 0] f0-prev) (dump data) (set data nil)) ;; if we have a second field, set f0-prev to that field, ;; then capture the record in a structure '' and push it on the list. Go to the next record. ([f 1] (set f0-prev [f 0]) (push (new rec f1 [f 1] rec rec) data) (next)) ;; we don't have a second field: just sort and ;; dump the accumulated data, and also print this record. (t (dump data) (prn)) ;; end of data: sort and dump accumulated data. (:end (dump data)))
[Prev in Thread] | Current Thread | [Next in Thread] |