parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Combining --pipe and --shebang options


From: Michel Samia
Subject: Re: Combining --pipe and --shebang options
Date: Tue, 20 Nov 2012 11:21:22 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2

On 19.11.2012 20:26, Ole Tange wrote:
It actually is possible to combine --shebang and --pipe - but not the
way you want.

--shebang considers the rest of the file as data as if it was received
on -a or through stdin.

#!/usr/local/bin/parallel --shebang --pipe -k -j24 cat
a
b
c
d
I tried it also without --shebang and the problem is, that the exec syscall on linux doesn't tokenize the rest of the line and passes the rest of the shebang line as one long argument containing spaces. But is fixed in the read_options subrutine in case of --shabang at the beginning of it (I don't know perl much but I understood this :) ).

 822     # This must be done first as this may exec myself
 823     if(defined $ARGV[0] and ($ARGV[0]=~/^--shebang / or
 824                              $ARGV[0]=~/^--hashbang /)) {
 825         # Program is called from #! line in script
 826         $ARGV[0]=~s/^--shebang *//; # remove --shebang if it is set
 827         $ARGV[0]=~s/^--hashbang *//; # remove --hashbang if it is set
 828         my $argfile = pop @ARGV;
 829         # exec myself to split $ARGV[0] into separate fields
 830         exec "$0 --skip-first-line -a $argfile @ARGV";
 831     }

What you want is to pass the rest of the file to python and let
parallel chunk up stdin to the python script.

I really like your idea, but it is clearly not a bug that it does not
work currently.

Your idea would be useful for any script (Shell, Perl, Python) which
can either process only one file or process stuff from stdin. Using
GNU Parallel your script can suddenly process many files/blocks of
data and in parallel.

I do not see we can change the behaviour of --shebang, but we can
invent a new option.
I agree, the semantics of --shebang is a little bit different than what I need to do, so we should better add new option.

So it could be something like:

#!/usr/bin/parallel --shebang-program --pipe -k -j24 /usr/bin/python

(Please come up with a better name than  --shebang-program)
--shebang-program is maybe too large, but sounds good, shorter variant, and also quite descriptive, can be for example --script. Both are good, just choose what do you prefer :)

This should accept data on stdin which should be chunked and passed to
the python program. So the program will be called like:

   cat foo bar | my_program

or:

   my_program foo bar

Without the --pipe:

#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/python

should work like:

   parallel -k -j24 /usr/bin/python my_program {}

So:

   my_program foo bar
   (echo foo; echo bar) | my_program

should do the same as:

   parallel -k -j24 /usr/bin/python my_program {} ::: foo bar

We should allow for putting options on the command interpreter. E.g:

#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/perl -p

Also without the --pipe the {} should work as expected:

#!/usr/bin/parallel --shebang-program -k -j24 /usr/bin/perl -p {}
{.}.out > {.}.log

Are there things I am not covering? Are there use cases that this will
not cover?
Thank you for all the use cases. I don't see any missing use cases :) If someone finds any later, (s)he can open a bug report or send an e-mail to the mailing list ;)

If I knew perl more, I could try to write a patch fixing this bug, but I think it will be better when it will be written by some perlist.


/Ole
Michel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]