bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort: memory exhausted with 50GB file


From: Jim Meyering
Subject: Re: sort: memory exhausted with 50GB file
Date: Sat, 26 Jan 2008 16:36:08 +0100

Jim Meyering <address@hidden> wrote:

> Leo Butler <address@hidden> wrote:
>
>> < Paul Eggert <address@hidden> wrote:
>> < ...
>> < > Hmm, it sounds like your input data has some very long lines, then.
>> < > That would explain at least part of your problem, then.  'sort' needs
>> < > to keep at least two lines in main memory to compare them: if single
>> < > input lines are many gigabytes long, then 'sort' must consume many
>> < > gigabytes of memory, regardless of what parameter you specify with '-S'.
>> <
>> < You can run this to find the maximum line length:
>> <
>> <   wc --max-line-length your-data
> ...
>> $ /usr/bin/wc -L /data/espace/k_400_a.out
>> 107
>
> That would have worked if your data really did have
> the form you originally described.
>
> With binary data, you have be careful.
> E.g., translate all non-printable/space bytes to "."
> before using wc -L:
>
>   tr -c '[:print:][:space:]' '[.*]' < your-data | wc -L

Or, you could simply translate all non-newline bytes to e.g., ".":

   tr -c '\n' '[.*]' < your-data | wc -L




reply via email to

[Prev in Thread] Current Thread [Next in Thread]