parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Design of --header when using --pipe


From: Cook, Malcolm
Subject: RE: Design of --header when using --pipe
Date: Wed, 23 Nov 2011 14:59:38 -0600

Ole,

Great directions.  

I've used --pipe now a few times with .fasta files, using '^>' as record 
seperator.

I think you cover that cases that I've thought about so far.

Also, I've wanted to be able to define blocks in terms of number of lines.  For 
instance, fastq format has new record every 4 lines.  Is there a way to block 
on line number. (candidate blocks are where the line number is divisible by 4).

Best,

~Malcolm


> -----Original Message-----
> From: parallel-bounces+mec=stowers.org@gnu.org [mailto:parallel-
> bounces+mec=stowers.org@gnu.org] On Behalf Of Ole Tange
> Sent: Wednesday, November 23, 2011 2:46 PM
> To: parallel@gnu.org
> Subject: Design of --header when using --pipe
> 
> I have seen others ask for it and now I have even had use for it
> myself: A way to repeat a header for each block when using --pipe
> 
> If you are processing a big CSV-file and the first line is the column
> names you want this line to be repeated for each block passed to a
> parallel process.
> 
> The simple fix is just to assume that the header is a single line. But
> I think we can do better than that.
> 
> I would like to at least be able to process these 4 types of headers:
> 
> * The CSV-header: A single line. Maybe extended to a given number of
> lines that can be 1?
> 
> * A header that has multiple lines prepended with a special character:
> 
> % header1
> % header2
> data
> data
> 
> * A header that has a symbol dividing header from body. E.g. \n\n in emails:
> 
> From root@alpha.tange.dk Mon Apr 23 10:20:38 2007
> Return-path: <root@alpha.tange.dk>
> From: Anacron <root@alpha.tange.dk>
> To: root@alpha.tange.dk
> Subject: Anacron job 'cron.daily' on alpha
> Message-Id: <E1Hftmo-0001Jn-D5@localhost>
> Date: Mon, 23 Apr 2007 10:20:38 +0200
> 
> data
> data
> 
> * A fixed length header in bytes, so --pipe can process binary data
> with a fixed block length.
> 
> This header is 25 bytes.
> This data is taking up 33 bytes.
> This data is 33 bytes in length.
> Thirty three bytes used for this
> Space for this: 33 bytes incl \n
> 
> Do you have other data files with headers that would require different
> treatment?
> 
> 
> /Ole




reply via email to

[Prev in Thread] Current Thread [Next in Thread]