[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Limiting memory used by parallel?
From: |
hubert depesz lubaczewski |
Subject: |
Re: Limiting memory used by parallel? |
Date: |
Mon, 29 Jan 2018 18:19:28 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Sun, Jan 28, 2018 at 02:45:42AM +0100, Ole Tange wrote:
> On Thu, Jan 25, 2018 at 4:33 PM, hubert depesz lubaczewski
> You can also use --cat:
>
> tar cf - /some/directory | parallel -j 5 --pipe --block 5G --cat
> --recend '' 'cat {} | ./handle-single-part.sh {#}'
>
> This way each block is saved to the tempdir before the job starts. By
> my limited testing this should make GNU Parallel only keep 1-2 blocks
> in memory.
So, I did try it.
To make it as simple as possible, I made source of data:
dd if=/dev/zero bs=8k count=13107200
which generated 100GB of \x00 bytes.
This was then passed to:
1. in "normal" test, to:
/home/depesz/parallel/bin/parallel \
-j 5 \
--pipe \
--block 2000M \
--recend '' \
/home/depesz/test/handle-single-part.sh
"/tmp/depesz/out/tarball.part-{#}.gz.aes"
2. in "cat" test, to:
/home/depesz/parallel/bin/parallel \
-j 5 \
--pipe \
--block 2000M \
--recend '' \
--cat \
/home/depesz/test/handle-single-part.sh {}
"/tmp/depesz/out/tarball.part-{#}.gz.aes"
the handle-single-part.sh script was modified, in "normal" case it did:
cat - | gzip -9c - | openssl enc -pass "file:pass.file" -aes-256-cbc >
"${output_file}"
and in "cat test", it was doing:
gzip -9c "${input_file}" | openssl enc -pass "file:pass.file" -aes-256-cbc >
"${output_file}"
Results of the test:
Time of tests:
normal:
real 3m45.748s
user 12m51.147s
sys 7m6.878s
cat:
real 5m38.099s
user 13m7.587s
sys 9m11.370s
So cat is evidently slower (as it has to write uncompresed data to disk, and
then re-read it)
What's worse. Every 1 second, I logged "ps uwf t
<terminal-that-i-was-running-the-test-on>"
Then, for each such "ps dump", I summed rss column of all processes.
In case of normal, worse memory usage was 12,402,552 kB. In case of cat
test it was: 12,382,736 kB.
So there is no real memory usage difference, but the cat approach is
significantly slower.
You can see whole ps output on:
normal test: https://share.riseup.net/#IfwBFcQEr0qI3HuBKTpvDA
cat test: https://share.riseup.net/#QhtkCvfjrM6zu4oua5FhRg
Best regards,
depesz