[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request: gzip/bzip support for sort
From: |
Jim Meyering |
Subject: |
Re: feature request: gzip/bzip support for sort |
Date: |
Thu, 18 Jan 2007 21:58:26 +0100 |
Paul Eggert <address@hidden> wrote:
> Jim Meyering <address@hidden> writes:
>> So, with just one trial each, I see a 19% speed-up.
>
> Yaayyy! That's good news. Thanks for timing it. I read your email
> just after talking with Dan (in person) about how we'd time it. I
> just bought 1 TB worth of disk for my home computer and hadn't hooked
> it up yet, so was going to volunteer that, but you beat me to it.
I've done some more timings, but with two more sizes of input.
Here's the summary, comparing straight sort with sort --comp=gzip:
2.7GB: 6.6% speed-up
10.0GB: 17.8% speed-up
For the smaller input, I also did as James Youngman suggested
and used "cat" as the no-op compressor/decompressor.
That made sort run 34% longer.
====================
Here's the smaller input:
$ seq 9999999 > k
$ cat k k k k k k k k k > j
$ cat j j j j > sort-in
$ wc -c sort-in
2839999968 sort-in
With --compress=gzip:
$ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out
814.07user 29.97system 14:50.16elapsed 94%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (4major+2821589minor)pagefaults 0swaps
With no --compress= option:
$ /usr/bin/time ./sort -T. < sort-in > out
398.98user 17.08system 15:53.49elapsed 43%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (2major+229797minor)pagefaults 0swaps
With --compress=$PWD/cat-wrap:
[where the cat-wrap script accepts and ignores the -d option:
printf '#!/bin/sh\ntest $# != 0 && test x$1 = x-d && shift; exec cat "$@"' \
> cat-wrap
chmod a+x cat-wrap
BTW, this example demonstrates already how it'd be nice to be able to
specify a decompressor separately: when the decompressor isn't "compressor
-d"
]
$ /usr/bin/time ./sort -T. --compress=$PWD/cat-wrap < sort-in > out
439.67user 54.02system 19:50.86elapsed 41%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (1major+2817586minor)pagefaults 0swaps
=================================
Using a 10GB data set (exactly 10737418240 bytes),
formed by concatenating four copies of the above and then truncating
to the desired length, ...
$ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out; Rm out
3330.45user 139.57system 1:00:10elapsed 96%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (5major+10679797minor)pagefaults 0swaps
$ /usr/bin/time ./sort -T. < sort-in > out; Rm out
1643.09user 86.83system 1:13:13elapsed 39%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (2major+233951minor)pagefaults 0swaps
The result: an 18% speed-up.
- Re: feature request: gzip/bzip support for sort, (continued)
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/24
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/25
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Paul Eggert, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Bauke Jan Douma, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/16
- Re: feature request: gzip/bzip support for sort,
Jim Meyering <=
- Re: feature request: gzip/bzip support for sort, Philip Rowlands, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Philip Rowlands, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/16
- Re: feature request: gzip/bzip support for sort, James Youngman, 2007/01/16
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Jim Meyering, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/18
- Re: feature request: gzip/bzip support for sort, Dan Hipschman, 2007/01/18