parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The difference between "parallel cat >> file" and "parallel “cat >> file


From: Nan Xiao
Subject: The difference between "parallel cat >> file" and "parallel “cat >> file”"
Date: Fri, 11 Mar 2016 09:48:51 +0800

Hi all,

I am a novice of GNU Parallel, and after reading the content and discussion of A Million Text Files And A Single Laptop, I want to
make sure whether my understanding between "ls | parallel -m -j $f “cat {} >> ../transactions_cat/transactions.csv”" and
"ls | parallel -m -j $f cat {} >> ../transactions_cat/transactions.csv" is right:

(1) ls | parallel -m -j $f “cat {} >> ../transactions_cat/transactions.csv”

In this case, the job should be:

job 1: cat file1 >> ../transactions_cat/transactions.csv
job 2: cat file2 >> ../transactions_cat/transactions.csv
job 3: cat file3 >> ../transactions_cat/transactions.csv
......

Since the output to "../transactions_cat/transactions.csv" belongs to the job, it is out of GNU Parallel's control. So there exists
the contention issue that multiple processes write to the same file currently, may be a lock is needed.

(2) ls | parallel -m -j $f cat {} >> ../transactions_cat/transactions.csv

In this case, the job should be:

job 1: cat file1
job 2: cat file2
job 3: cat file3
......

since the output to "../transactions_cat/transactions.csv" is parallel's responsibility, it is in GNU Parallel's control. The GNU parallel
can buffer the output of every job, and write them to "../transactions_cat/transactions.csv" one by one, so this can make sure the output 
of different jobs can't mix up.

Do I understand right? If not, could someone give some corrections?

Thanks in advance!


Best Regards
Nan Xiao (肖楠)
Skype: xiaonan19830818
Jabber/XMPP: nanxiao@xmpp.ru.net 
Telegram: nanxiao
Personal website (Chinese): http://nanxiao.me/ 
Personal website (English): http://nanxiao.me/en 
Chinese DTrace website: http://chinadtrace.org/ 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]