bug#14752: sort fails to fork() + execlp(compress_program) if overcommit

From: Bernhard Voelker
Subject: bug#14752: sort fails to fork() + execlp(compress_program) if overcommit limit is reached
Date: Mon, 01 Jul 2013 09:16:37 +0200
On 06/30/2013 03:42 AM, Petros Aggelatos wrote:
> I was trying to sort a big file (22GB, 5GB gzipped) with `sort
> --compress-program=gzip -S40% data`. My /tmp filesystem is a 6GB tmpfs
> and the total system RAM is 16GB. The problem was that after a while
> sort would write uncompressed temp files in /tmp causing it to fill up
> and then crash for having no free space.

Thanks for reporting this.  However, I think that your system's memory
is just too small for sorting that file (that way, see below).

You already recognized yourself that sort(1) was writing huge chunk files
into the /tmp directory which is a tmpfs file system, i.e., that all that
data is decreasing the memory available for running processes.
The overhead for spawning a new process is negligible compared to such
an amount of data.

In such a case, you're much better off telling sort(1) to use a different
directory for the temporary files.

Here's an excerpt from the texinfo manual
(info coreutils 'sort invocation'):

     If the environment variable `TMPDIR' is set, `sort' uses its value
  as the directory for temporary files instead of `/tmp'.  The
  `--temporary-directory' (`-T') option in turn overrides the environment


       Use directory TEMPDIR to store temporary files, overriding the
       `TMPDIR' environment variable.  If this option is given more than
       once, temporary files are stored in all the directories given.  If
       you have a large sort or merge that is I/O-bound, you can often
       improve performance by using this option to specify directories on
       different disks and controllers.

Have a nice day,

