parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Parallelising grep


From: Cook, Malcolm
Subject: RE: Parallelising grep
Date: Fri, 9 Aug 2013 16:26:50 +0000

Assuming your shell is bash....

 

With this exported function

 

function slice {

# PURPOSE: After an optional -h lines of header (which are echoed

# unless supressed with <-sh>), echo every <-n>th line (default:

# every 1 line) starting with the <-m>th (counting from 1, starting

# with the first line after the header, default: starting with the

# <n-1>th line.)

# AUTHOR: malcolm_cook@stowers.org

# EXAMPLE: slice -h=1 -sh -n=5 foo.tab > foo_every_fifth_line_after_the_one_line_header.tab

# set -e ;

perl -snwe 'BEGIN{our $n||=1; our $m=($n) unless defined($m); $m-=1; our $h||=0; die "required: m < n" unless $m < $n; our $sh} print $_ if (($. > $h ) ? (($. -1 - $h) % $n == $m) : ! $sh)' -- $@

}

export -f slice

 

 

...you can create a parallel jobs where each job greps a slice of in.bam

 

You would pass parallels {#} as the value for –m and the same value you pass as –j to parallel as the value for –n

 

You’ll probably need to use parallels –q and have each job call bash.

 

The following is untested.

 

parallel –j 10 –q ‘bash –c “samtools view in.bam | slice –n=10 –m={#} | bash –c fgrep -w -f read.ids”’ > alignments.txt

 

The output will have the slices interwoven.

 

 

 

 

From: parallel-bounces+mec=stowers.org@gnu.org [mailto:parallel-bounces+mec=stowers.org@gnu.org] On Behalf Of Nathan S. Watson-Haigh
Sent: Friday, August 09, 2013 12:54 AM
To: parallel@gnu.org
Subject: Parallelising grep

 

I have a SAM/BAM file and I’d like to grep for alignments of certain reads IDs. I have the read ID strings in another file. I’m currently doing this with:

$ samtools view in.bam | fgrep -w -f read.ids > alignments.txt

 

Is it possible to parallelise the grep by having each grep process a different subset of read iDs from the read.ids file? Or is there an alternative way to parallelise this which I have overlooked?

 

Cheers,

Nathan

 

 

--

Nathan S. Watson-Haigh, PhD

Research Fellow in Bioinformatics

 

Description: Description: Description: logo1a4Signature

 

Australian Centre for Plant Functional Genomics (ACPFG)

School of Agriculture, Food and Wine

University of Adelaide Waite Campus

Plant Genomics Centre

Hartley Grove, Urrbrae

SA 5064

 

Phone:                  +61 8 8313 2046

Mobile:                +61 438 711 615

Skype:                   nathanhaigh

Email:                  nathan.haigh@acpfg.com.au

Web:                     http://www.acpfg.com.au/bioinformatics

LinkedIn               http://www.linkedin.com/profile/view?id=114191748

 

Github:                 https://github.com/nathanhaigh/

                                https://gist.github.com/nathanhaigh/

Twitter:                @watsonhaigh

                                @BIG_SA1

RID:                        B-9833-2008

ResearchGate:  Nathan_Watson-Haigh

 


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


reply via email to

[Prev in Thread] Current Thread [Next in Thread]