coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sort Files Merge


From: Gil Shomron
Subject: Sort Files Merge
Date: Fri, 26 Feb 2016 00:29:48 +0200
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

Hi all,

I've been investigating sort functionality when sorting big files (larger than RAM size). I'm looking at the system calls and the actual read and write related instructions the application uses. For sort to manage large files, my guess is, it implements some kind of external merge sort algorithm. And indeed, I can see sort load chunks from the big file, sorts them, and saves them back to disk when sorted.

But looking a bit further there are two things I do not understand:
1) There are plenty of reads to different buffers of different sizes (49152, 36864, 28672 for example) - why? I mean, why it doesn't just read the whole chunk once and then merge sort it? 2) After splitting and sorting each chunk, sort needs to merge them to one sorted list. When I examine the instructions it seems like sort does A LOT of accesses to those buffers in order to sort them, almost at the same intensity of merging each chunk - why? doesn't it just need to look at the beginning of each file, check where is the smallest value, put it in an output buffer and then write it to the final output file?

I've been digging in the code, but unfortunately didn't find my answers there yet.

I will really appreciate your help with helping me understand how sort is really implemented.

Thanks!
   Gil Shomron.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]