coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to speed up sort for partially sorted input?


From: Kaz Kylheku (Coreutils)
Subject: Re: how to speed up sort for partially sorted input?
Date: Tue, 10 Aug 2021 22:23:31 -0700
User-agent: Roundcube Webmail/0.9.2

On 2021-08-10 22:06, Kaz Kylheku (Coreutils) wrote:
On 2021-08-07 17:46, Peng Yu wrote:
Hi,

Suppose that I want to sort an input by column 1 and column 2 (column
1 is of a higher priority than column 2). The input is already sorted
by column1.

Is there a way to speed up the sort (compared with not knowing column
1 is already sorted)? Thanks.

Since you know that colum 1 is sorted, it means that a sequential scan
of the data will reveal chunks that have the same colum1 value.

You just have to read and separate these chunks, and sort each one
individually by column 2.

GNU Awk has the wherewithal for this sort of thing; it has some facilities
for sorting associative arrays.

TXR Lisp: structure + function + awk macro:

(defstruct rec ()
  f1
  rec)

;; sort list of records by slot f1, and then
;; dump their rec slots in order via put-line.

(defun dump (data)
  (mapdo [chain .rec put-line] (nsort data : .f1)))

(awk
  ;; two local variables
  (:let f0-prev data)

  ;; if field zero is not same as f0-prev, sort and dump data,
  ;; then set data to nil again

  ((nequal [f 0] f0-prev) (dump data)
                          (set data nil))

  ;; if we have a second field, set f0-prev to that field,
  ;; then capture the record in a structure
  '' and push it on the list.  Go to the next record.

  ([f 1] (set f0-prev [f 0])
         (push (new rec f1 [f 1] rec rec) data)
         (next))

  ;; we don't have a second field: just sort and
  ;; dump the accumulated data, and also print this record.

  (t (dump data)
     (prn))

  ;; end of data: sort and dump accumulated data.
  (:end (dump data)))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]