[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] sort: parallel external sort implementation
From: |
Joey Degges |
Subject: |
[PATCH] sort: parallel external sort implementation |
Date: |
Wed, 3 Mar 2010 22:01:27 -0800 |
Hello,
Matt (cc'd, not on list) and I have developed a patch that parallelizes sort
using pthreads. The threading approach taken here is to break up input files
into groups based on physical device and sort each of those groups in
parallel.
This allows for storage and processing resources to be more fully utilized
during the sort process.
Below is a set of experimental results and the statistics of the machine
that
they were run on. The test below show speed improvements from 45% - 64%.
In this patch the method used to determine which device a particular file is
on
relies on comparisons between struct stat -> st_dev fields. This leads to
some
problems in cases where files exist on different partitions of the same
disk.
Preliminary tests have shown that under these situations some performance
improvements are still seen, but they are not as dramatic as in the ideal
case.
Please send any comments/suggestions.
Thanks,
Joey
system details
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
cache size: 4096 KB
RAM: 2G
disk details:
/q/0:
Timing cached reads: 7556 MB in 2.00 seconds = 3781.13 MB/sec
Timing buffered disk reads: 320 MB in 3.00 seconds = 106.50 MB/sec
/q/1:
Timing cached reads: 7772 MB in 2.00 seconds = 3888.09 MB/sec
Timing buffered disk reads: 224 MB in 3.03 seconds = 74.00 MB/sec
/q/2:
Timing cached reads: 7444 MB in 2.00 seconds = 3724.34 MB/sec
Timing buffered disk reads: 222 MB in 3.01 seconds = 73.77 MB/sec
/q/3:
Timing cached reads: 7512 MB in 2.00 seconds = 3758.49 MB/sec
Timing buffered disk reads: 240 MB in 3.02 seconds = 79.50 MB/sec
before each run the cache was dropped with:
echo 3 > /proc/sys/vm/drop_caches
store temp files in RAM:
mount -t tmpfs -o size=1000M /ram /ram
export TMPDIR=/ram
multidisk sort with 4x 50M inputs
$ time -v sort_multidisk /q/0/50M /q/1/50M /q/2/50M /q/3/50M > /dev/null
User time (seconds): 10.89
System time (seconds): 0.83
Percent of CPU this job got: 180%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.50
Major (requiring I/O) page faults: 20
Minor (reclaiming a frame) page faults: 72843
Voluntary context switches: 1936
Involuntary context switches: 149
File system inputs: 393080
current sort with 4x 50M inputs
$ time -v sort_current /q/0/50M /q/1/50M /q/2/50M /q/3/50M > /dev/null
User time (seconds): 11.05
System time (seconds): 0.32
Percent of CPU this job got: 86%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.19
Major (requiring I/O) page faults: 18
Minor (reclaiming a frame) page faults: 72532
Voluntary context switches: 915
Involuntary context switches: 48
File system inputs: 198936
multidisk sort with 4x 100M input files
$ time -v sort_multidisk /q/0/100M /q/1/100M /q/2/100M /q/3/100M > /dev/null
User time (seconds): 24.84
System time (seconds): 1.35
Percent of CPU this job got: 242%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.82
Major (requiring I/O) page faults: 22
Minor (reclaiming a frame) page faults: 145101
Voluntary context switches: 3167
Involuntary context switches: 523
File system inputs: 784440
current sort with 4x 100M input files
$ time -v sort_current /q/0/100M /q/1/100M /q/2/100M /q/3/100M > /dev/null
User time (seconds): 24.23
System time (seconds): 0.65
Percent of CPU this job got: 83%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.77
Major (requiring I/O) page faults: 19
Minor (reclaiming a frame) page faults: 144797
Voluntary context switches: 3133
Involuntary context switches: 41
File system inputs: 784000
multidisk sort with 4x 200M input files
$ time -v sort_multidisk -S200M /q/0/200M /q/1/200M /q/2/200M /q/3/200M >
/dev/null
User time (seconds): 52.15
System time (seconds): 2.96
Percent of CPU this job got: 162%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:33.99
Major (requiring I/O) page faults: 35
Minor (reclaiming a frame) page faults: 256351
Voluntary context switches: 7215
Involuntary context switches: 1048
File system inputs: 1570224
current sort with 4x 200M input files:
$ time -v sort_current -S200M /q/0/200M /q/1/200M /q/2/200M /q/3/200M >
/dev/null
User time (seconds): 49.67
System time (seconds): 2.05
Percent of CPU this job got: 82%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.66
Major (requiring I/O) page faults: 20
Minor (reclaiming a frame) page faults: 102675
Voluntary context switches: 6637
Involuntary context switches: 256
File system inputs: 1566224
0001-sort-parallel-external-sort-implementation.patch
Description: Text Data