coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] join feature: auto-format


From: Pádraig Brady
Subject: Re: [coreutils] join feature: auto-format
Date: Thu, 07 Oct 2010 11:22:13 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 07/10/10 01:03, Pádraig Brady wrote:
> On 06/10/10 21:41, Assaf Gordon wrote:
>> Hello,
>>
>> I'd like to (re)suggest a feature for the join program - the ability to 
>> automatically build an output format line (similar but easier than using 
>> "-o").
>>
>> I've previously mentioned it here (but got no favorable responses):
>> http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00151.html
>>
>> Several people have been using this option for a year now (on our local 
>> servers), so I thought I might try to suggest it again.
>>
>> The full patch is attached, and also available here:
>> http://cancan.cshl.edu/labmembers/gordon/files/join_auto_format_2010_10_06.patch
>>
>> Here's the common use case:
>>
>> Given two tabular files, with a common key at first column, and many numeric 
>> (or other) values on other columns, the user wants to join them together 
>> easily.
>> One requirement is that empty/missing values should be populated with "00".
>>
>> File 1
>> ======
>> bar 10 13 15 16 11 32
>> foo 10 10 11 12 13 14
>>
>>
>> File 2
>> ======
>> bar 99 91 90 93 91 93
>> baz 90 91 99 96 97 95
>>
>>
>> Desired joined output
>> ==============
>> bar 10 13 15 16 11 32 99 91 90 93 91 93
>> baz 00 00 00 00 00 00 90 91 99 96 97 95
>> foo 10 10 11 12 13 14 00 00 00 00 00 00
>>
>> There is no technical problem in achieving this, the parameters would be:
>> "-a1 -a2 -e 00 -o 0,1.2,1.3,1.4,1.5,1.6,1.7,2.2,2.3,2.4,2.5,2.6,2.7"
>>
>> But building the "-o" parameter is cumbersome, and error-prone (imaging 
>> files with dozens of columns, which is very common in my case).
>>
>> The "--auto-format" feature simply builds the "-o" format line 
>> automatically, based on the number of columns from both input files.
> 
> Thanks for persisting with this and presenting a concise example.
> I agree that this is useful and can't think of a simple workaround.
> Perhaps the interface would be better as:
> 
> -o {all (default), padded, FORMAT}
> 
> where padded is the functionality you're suggesting?

Thinking more about it, we mightn't need any new options at all.
Currently -e is redundant if -o is not specified.
So how about changing that so that if -e is specified
we operate as above by auto inserting empty fields?
Also I wouldn't base on the number of fields in the first line,
instead auto padding to the biggest number of fields
on the current lines under consideration.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]