[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#14752: sort fails to fork() + execlp(compress_program) if overcommit
bug#14752: sort fails to fork() + execlp(compress_program) if overcommit limit is reached
Mon, 01 Jul 2013 10:52:00 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2
On 07/01/2013 08:16 AM, Bernhard Voelker wrote:
> tag 14752 notabug
> close 14752
> On 06/30/2013 03:42 AM, Petros Aggelatos wrote:
>> I was trying to sort a big file (22GB, 5GB gzipped) with `sort
>> --compress-program=gzip -S40% data`. My /tmp filesystem is a 6GB tmpfs
>> and the total system RAM is 16GB. The problem was that after a while
>> sort would write uncompressed temp files in /tmp causing it to fill up
>> and then crash for having no free space.
> Thanks for reporting this. However, I think that your system's memory
> is just too small for sorting that file (that way, see below).
I'm not so sure there is nothing we might change here.
Petros said that the sort succeeded when overcommit was set to "always
Some notes on fork() overcommit behavior:
Petros, for completeness, what kernel were you using,
and was SELinux enabled?
> You already recognized yourself that sort(1) was writing huge chunk files
> into the /tmp directory which is a tmpfs file system, i.e., that all that
> data is decreasing the memory available for running processes.
> The overhead for spawning a new process is negligible compared to such
> an amount of data.
> In such a case, you're much better off telling sort(1) to use a different
> directory for the temporary files.
> Here's an excerpt from the texinfo manual
> (info coreutils 'sort invocation'):
> If the environment variable `TMPDIR' is set, `sort' uses its value
> as the directory for temporary files instead of `/tmp'. The
> `--temporary-directory' (`-T') option in turn overrides the environment
> `-T TEMPDIR'
> Use directory TEMPDIR to store temporary files, overriding the
> `TMPDIR' environment variable. If this option is given more than
> once, temporary files are stored in all the directories given. If
> you have a large sort or merge that is I/O-bound, you can often
> improve performance by using this option to specify directories on
> different disks and controllers.
Note tmpfs is backed by RAM and swap, so depending on the swapiness settings
for the kernel, it will auto migrate to the swap device(s) under RAM pressure.
BTW, -S40% may be the root of the issue.
Petros have you tried smaller buffers, which would probably avoid
the issue on fork(), but also may take advantage of cache locality.
I.E. sort currently takes advantage of a 2 level memory hierarchy
by allocating large RAM buffers by default, and assuming /tmp
is the next storage level down. By reducing the mem buffers down
to take advantage of ever increasing cache sizes further up the hierarchy,
may increase performance while avoiding the fork() issue.
I previously made some notes on this here:
vfork() might be an option here. One can't rely on it being different
to fork(), and it blocks the parent until the exec() in the child,
and there are various restrictions on the child, but that might be fine?
But I think posix_spawn() is the new standardised equivalent,
and I notice the spawn-pipe gnulib module which might be leverged here?