RE: Design of --header when using --pipe

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Design of --header when using --pipe

From:	Cook, Malcolm
Subject:	RE: Design of --header when using --pipe
Date:	Wed, 23 Nov 2011 14:59:38 -0600

Ole,

Great directions.  

I've used --pipe now a few times with .fasta files, using '^>' as record 
seperator.

I think you cover that cases that I've thought about so far.

Also, I've wanted to be able to define blocks in terms of number of lines.  For 
instance, fastq format has new record every 4 lines.  Is there a way to block 
on line number. (candidate blocks are where the line number is divisible by 4).

Best,

~Malcolm


> -----Original Message-----
> From: parallel-bounces+mec=stowers.org@gnu.org [mailto:parallel-
> bounces+mec=stowers.org@gnu.org] On Behalf Of Ole Tange
> Sent: Wednesday, November 23, 2011 2:46 PM
> To: parallel@gnu.org
> Subject: Design of --header when using --pipe
> 
> I have seen others ask for it and now I have even had use for it
> myself: A way to repeat a header for each block when using --pipe
> 
> If you are processing a big CSV-file and the first line is the column
> names you want this line to be repeated for each block passed to a
> parallel process.
> 
> The simple fix is just to assume that the header is a single line. But
> I think we can do better than that.
> 
> I would like to at least be able to process these 4 types of headers:
> 
> * The CSV-header: A single line. Maybe extended to a given number of
> lines that can be 1?
> 
> * A header that has multiple lines prepended with a special character:
> 
> % header1
> % header2
> data
> data
> 
> * A header that has a symbol dividing header from body. E.g. \n\n in emails:
> 
> From root@alpha.tange.dk Mon Apr 23 10:20:38 2007
> Return-path: <root@alpha.tange.dk>
> From: Anacron <root@alpha.tange.dk>
> To: root@alpha.tange.dk
> Subject: Anacron job 'cron.daily' on alpha
> Message-Id: <E1Hftmo-0001Jn-D5@localhost>
> Date: Mon, 23 Apr 2007 10:20:38 +0200
> 
> data
> data
> 
> * A fixed length header in bytes, so --pipe can process binary data
> with a fixed block length.
> 
> This header is 25 bytes.
> This data is taking up 33 bytes.
> This data is 33 bytes in length.
> Thirty three bytes used for this
> Space for this: 33 bytes incl \n
> 
> Do you have other data files with headers that would require different
> treatment?
> 
> 
> /Ole

[Prev in Thread]

Current Thread

[Next in Thread]

Design of --header when using --pipe, Ole Tange, 2011/11/23
- RE: Design of --header when using --pipe, Cook, Malcolm <=

Prev by Date: Design of --header when using --pipe
Next by Date: Re: record as a number of lines
Previous by thread: Design of --header when using --pipe
Next by thread: Re: record as a number of lines
Index(es):
- Date
- Thread