[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36130: split bug

From: Heather Wick
Subject: bug#36130: split bug
Date: Fri, 7 Jun 2019 14:23:15 -0400

I am using split to split up some large, paired fastq files (nearly 4
billion lines each). I am using the -l flag to split into files of 10
million reads (40 million lines) each and though the fastq files have
matched and sorted reads, split is creating different numbers of split
files for the two paired fastq files, and the pairing becomes off at some
point. The jobs finished without exceeding memory and with an exit status
0, and I noticed the help file said to email this address if there were
bugs, so I thought I would mention it.
This is the line I am using to call split on my zipped fastq files:
zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
This creates 96 chunks for the R1 and 95 chunks for R2, even though the
orignal fastq files have the same number of reads.
Do you have any suggestions for how to proceed? Perhaps zcatting and piping
the files is not the best way to call split?
~ Heather

Heather Wick
PhD Candidate, Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic Medicine
Johns Hopkins University School of Medicine

reply via email to

[Prev in Thread] Current Thread [Next in Thread]