coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] join feature: auto-format


From: Pádraig Brady
Subject: Re: [coreutils] join feature: auto-format
Date: Thu, 07 Oct 2010 01:03:19 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 06/10/10 21:41, Assaf Gordon wrote:
> Hello,
> 
> I'd like to (re)suggest a feature for the join program - the ability to 
> automatically build an output format line (similar but easier than using 
> "-o").
> 
> I've previously mentioned it here (but got no favorable responses):
> http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00151.html
> 
> Several people have been using this option for a year now (on our local 
> servers), so I thought I might try to suggest it again.
> 
> The full patch is attached, and also available here:
> http://cancan.cshl.edu/labmembers/gordon/files/join_auto_format_2010_10_06.patch
> 
> Here's the common use case:
> 
> Given two tabular files, with a common key at first column, and many numeric 
> (or other) values on other columns, the user wants to join them together 
> easily.
> One requirement is that empty/missing values should be populated with "00".
> 
> File 1
> ======
> bar 10 13 15 16 11 32
> foo 10 10 11 12 13 14
> 
> 
> File 2
> ======
> bar 99 91 90 93 91 93
> baz 90 91 99 96 97 95
> 
> 
> Desired joined output
> ==============
> bar 10 13 15 16 11 32 99 91 90 93 91 93
> baz 00 00 00 00 00 00 90 91 99 96 97 95
> foo 10 10 11 12 13 14 00 00 00 00 00 00
> 
> There is no technical problem in achieving this, the parameters would be:
> "-a1 -a2 -e 00 -o 0,1.2,1.3,1.4,1.5,1.6,1.7,2.2,2.3,2.4,2.5,2.6,2.7"
> 
> But building the "-o" parameter is cumbersome, and error-prone (imaging files 
> with dozens of columns, which is very common in my case).
> 
> The "--auto-format" feature simply builds the "-o" format line automatically, 
> based on the number of columns from both input files.

Thanks for persisting with this and presenting a concise example.
I agree that this is useful and can't think of a simple workaround.
Perhaps the interface would be better as:

-o {all (default), padded, FORMAT}

where padded is the functionality you're suggesting?

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]