[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Taking advantage of L1 and L2 cache in sort
From: |
Pádraig Brady |
Subject: |
Re: Taking advantage of L1 and L2 cache in sort |
Date: |
Wed, 03 Mar 2010 01:06:06 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2 |
On 02/03/10 18:20, Chen Guo wrote:
This is exactly what that guy Shaun Jackman was talking about earlier.
I'm actually really surprised this is faster, if I can dig up his e-mail I'll
forward him this, I remember him saying something about experimenting
with exactly this.
I missed that thread but yes he pretty much had the
same idea as I stumbled on when trying to perturb
the posix_fadvise() testing by changing the buffer size.
http://lists.gnu.org/archive/html/bug-coreutils/2010-02/msg00151.html
Spooky :)
Shaun, you can use `taskset` to set process affinity BTW.
Can you profile the difference in the number of I/O system calls?
$ TMPDIR=/ram LANG=C /usr/bin/time -v strace -c ./src/sort sort.t/sort.1.test >
/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
70.70 0.283077 21775 13 read
28.97 0.115983 19331 6 munmap
0.32 0.001268 0 21609 write
0.01 0.000054 8 7 open
0.00 0.000000 0 9 close
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 1 access
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 uname
0.00 0.000000 0 5 mprotect
0.00 0.000000 0 25 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 4 getrlimit
0.00 0.000000 0 16 mmap2
0.00 0.000000 0 9 fstat64
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 fadvise64_64
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.400382 21717 3 total
Command being timed: "strace -c ./src/sort sort.t/sort.1.test"
User time (seconds): 26.91
System time (seconds): 2.01
Percent of CPU this job got: 90%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3
Minor (reclaiming a frame) page faults: 181060
Voluntary context switches: 87362
Involuntary context switches: 2526
Swaps: 0
File system inputs: 173504
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$ TMPDIR=/ram LANG=C /usr/bin/time -v strace -c ./src/sort -S1M sort.t/sort.1.test
> /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
38.95 0.035011 1 60991 read
33.47 0.030081 90 334 unlink
24.17 0.021721 0 81864 write
2.07 0.001862 2 1006 munmap
0.75 0.000670 1 673 open
0.23 0.000209 0 1016 mmap2
0.19 0.000167 0 675 fstat64
0.09 0.000085 0 675 close
0.07 0.000062 0 334 fcntl64
0.02 0.000018 0 1337 rt_sigprocmask
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 1 access
0.00 0.000000 0 3 brk
0.00 0.000000 0 1 1 ioctl
0.00 0.000000 0 1 gettimeofday
0.00 0.000000 0 1 uname
0.00 0.000000 0 5 mprotect
0.00 0.000000 0 334 _llseek
0.00 0.000000 0 25 rt_sigaction
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_thread_area
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 335 fadvise64_64
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.089886 149618 3 total
Command being timed: "strace -c ./src/sort -S1M sort.t/sort.1.test"
User time (seconds): 21.76
System time (seconds): 4.51
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.79
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 3
Minor (reclaiming a frame) page faults: 23038
Voluntary context switches: 598317
Involuntary context switches: 2316
Swaps: 0
File system inputs: 173504
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
cheers,
Pádraig.