bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort: memory exhausted with 50GB file


From: Leo Butler
Subject: Re: sort: memory exhausted with 50GB file
Date: Fri, 25 Jan 2008 23:13:22 +0000 (GMT)

I have pasted in the requested information below.
Leo

On Fri, 25 Jan 2008, Bob Proulx wrote:

< Leo Butler wrote:
< > -16 -2 -14 -5 1 1 0 0.3080808080808057 0 0.1540404040404028 
0.3904338415207971
< 
< That should be fine.
< 
< > I have a dual processor machine, with each processor being an Intel Core 2 
< > Duo E6850, rated at 3GHz and cache 4096 kB, with 3.8GB total physical 
< > memory and 4GB swap space and two partitions on the hdd with 200GB and 
< > 140GB available space.
< 
< Sounds like a very nice machine.

Yes, very. It pays to be nice to the sysadmin ;-).
 
< > I am using sort v. 5.2.1 and v. 6.1 & v. 6.9. The former is installed as 
< > part of the RHEL OS and the latter two were compiled from the source at 
< > http://ftp.gnu.org/gnu/coreutils/ with the gcc v. 3.4.6 compiler.
< 
< All good so far.  To nail down two more details, could you provide the
< output of these commands?
< 
<   uname -a

$ uname -a
Linux erdelyi.maths.ed.ac.uk 2.6.9-67.0.1.ELsmp #1 SMP Fri Nov 30 11:51:05 EST 
2007 i686 i686 i386 GNU/Linux


<   ldd --version | head -n1

$ ldd --version | head -n1 
ldd (GNU libc) 2.3.4


< 
<   file /usr/bin/sort ./sort

$ file /bin/sort
/bin/sort: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for 
GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped

$ file ~/c/coreutils/coreutils-6.1/src/sort
/home/lbutler/c/coreutils/coreutils-6.1/src/sort: ELF 32-bit LSB executable, 
Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses 
shared libs), not stripped

$ file ~/c/coreutils/coreutils-6.9/src/sort
/home/lbutler/c/coreutils/coreutils-6.9/src/sort: ELF 32-bit LSB executable, 
Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses 
shared libs), not stripped


< 
< That will give us the kernel and libc versions.  That last will report
< whether the binary programs are 32-bit or 64-bit.
< 
< > When I attempt to sort the file, with a command like
< > 
< > ./sort -S 250M -k 6,6n -k 7,7n -k 8,8n -k 9,9n -k 10,10n -k 11,11n -T /data 
-T /data2 -o out.sort in.txt
< > 
< > sort rapidly chews up about 40-50% of total physical memory (=1.5-1.9GB) at 
< > which point the error message 'sort: memory exhausted' appears. This 
< > appears to be independent of the parameter passed through the -S option.
< > ...
< > Is this an idiosyncratic problem?
< 
< That is very strange.  If by idiosyncratic do you mean is this
< particular to your system?  Probably.  Because I have routinely sorted
< large files without problem.  But that doesn't mean it isn't a bug.



I don't know if this is relevant, but I have extracted the 2nd through 1000th 
character in the 50GB file, and there appears to be garbage (unprintable chars) 
in the first line. The remainder of the extract looks fine. Moreover, I split 
the file into 500MB chunks, sorted these and then merge sorted the pairs. It 
appears that the 500MB chunks produced by split have been stripped of '\n' and 
are garbage, as are the sorted files.

I can email a sample if need be. 

< 
< At 50G the data file is very large compared to your 4G of physical
< memory.  This means that sort cannot sort it in memory.  It will open
< temporary files and sort a large chunk to one file and then another
< and then another as a first pass splitting up the input file into many
< sorted chunks.  As a second pass it will merge-sort the sorted chunks
< together into the output file.

Yes, I have successfully sorted a 7GB file of a similar type on an older
machine. I noticed that sort was employing several clever tricks.


< 
< What is the output of this command on your system?
< 
<   sysctl vm.overcommit_memory

$ /sbin/sysctl vm.overcommit_memory
vm.overcommit_memory = 0


< 
< I am asking because by default the linux kernel overcommits memory and
< does not return out of memory conditions.  Instead the process (or
< some other one) is killed by the linux out-of-memory killer.  But
< enterprise systems will be configured with overcommit disabled for
< reliability reasons and that appears to be how your system is
< configured because you wouldn't see a message about being out of
< memory from sort otherwise.  (I always disable overcommit so as to
< avoid the out-of-memory killer.)
< 
< Do you have user process limits active?  What is the output of this
< command?
< 
<   ulimit -a

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 1024
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 75776
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


< 
< What does free say on your system?
< 
<   free

$ free
             total       used       free     shared    buffers     cached
Mem:       4008144    3124208     883936          0     140216    2433740
-/+ buffers/cache:     550252    3457892
Swap:      4192956     101976    4090980
$ 


< 
< > I have read backlogs of the list and people report sort-ing 100GB
< > files. Do you have any ideas?
< 
< Without doing a lot of debugging I am wondering if your choice of
< locale setting is affecting this.  I doubt it because all of the sort
< fields are numeric.  But because this is easy enough could you try
< sorting using LC_ALL=C and see if that makes a difference?
< 
<   LC_ALL=C sort -k 6,6n -k 7,7n -k 8,8n -k 9,9n -k 10,10n -k 11,11n -T /data 
-T /data2 -o out.sort in.txt

There is no change in behaviour.


< 
< Also could you determine how large the process is at the time that
< sort reports running out of memory?  I am wondering if it is at a
< magic number size such as 2G or 4G that could provide more insight
< into the problem.

There isn't a magic number, but I have noticed that there appears to be between 
40-50% (with a low of 38%) of the memory allocated to sort when it runs out of 
memory.

Thanks
Leo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]