[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Goals of threaded sort
From: |
Chen Guo |
Subject: |
Goals of threaded sort |
Date: |
Mon, 23 Nov 2009 11:03:34 -0800 (PST) |
Hi all,
I'm looking for some group input/guidance. Is there a kind of performance
goal we want for a threaded sort implementation? Something I can work towards?
I submitted a patch earlier, but obviously it had its problems: copious
amounts of copying, worst case n^2, high variance in results, and in the end it
was only slightly faster than the previous threaded sort patch, and only at 8
threads and up. I've optimized the code further, but obviously the copying,
O(n^2), and variance issues still exist.
What me and Professor Eggert are exploring now are sorting networks, in
particular I've implemented a basic bitonic sort. The potential for parallelism
is fairly high, for example a test that took 12.6s user time took 2.5s real
time.
The biggest issue here is it is O(n (log n )^2). As a simple test, I
inserted an expensive operation for each time a struct line is copied, and
another (1.5 times as expensive, guessing here) for each time a compare is
made. On 8 threads, for up to 2^10 lines bitonic sort is faster or the same as
the current threaded implementation. Note that this test essentially makes
negligible the cost of extra recursion in bitonic sort when merging.
While bitonic sort parallelizes pretty well, this isn't something practical
right now. Maybe in 5, 10 years, when every computer has 16 cores to throw
around, or GPGPUs become widespread, I can see this approach succeeding, but
not now.
So I guess what I'm looking for is, are there any suggestions regarding a
direction to take? It seems to me that while the current submitted patch is
simple, it loses out on parallelizing potential. The algorithms that
parallelize better invariably end up requiring more work, thus not being much
faster anyway. Professor Eggert and I are both closed to tapped out ideas wise,
do you guys have any ideas for creating a compromise between the two?
- Goals of threaded sort,
Chen Guo <=