coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] join feature: auto-format


From: Pádraig Brady
Subject: Re: [coreutils] join feature: auto-format
Date: Fri, 07 Jan 2011 13:03:13 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 06/01/11 12:05, Pádraig Brady wrote:
> On 07/10/10 19:25, Pádraig Brady wrote:
>> On 07/10/10 18:43, Assaf Gordon wrote:
>>> Pádraig Brady wrote, On 10/07/2010 06:22 AM:
>>>> On 07/10/10 01:03, Pádraig Brady wrote:
>>>>> On 06/10/10 21:41, Assaf Gordon wrote:
>>>>>>
>>>>>> The "--auto-format" feature simply builds the "-o" format line 
>>>>>> automatically, based on the number of columns from both input files.
>>>>>
>>>>> Thanks for persisting with this and presenting a concise example.
>>>>> I agree that this is useful and can't think of a simple workaround.
>>>>> Perhaps the interface would be better as:
>>>>>
>>>>> -o {all (default), padded, FORMAT}
>>>>>
>>>>> where padded is the functionality you're suggesting?
>>>>
>>>> Thinking more about it, we mightn't need any new options at all.
>>>> Currently -e is redundant if -o is not specified.
>>>> So how about changing that so that if -e is specified
>>>> we operate as above by auto inserting empty fields?
>>>> Also I wouldn't base on the number of fields in the first line,
>>>> instead auto padding to the biggest number of fields
>>>> on the current lines under consideration.
>>>
>>> My concern is the principle of "least surprise" - if there are existing 
>>> scripts/programs that specify "-e" without "-o" (doesn't make sense, but 
>>> still possible) - this change will alter their behavior.
>>>
>>> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might 
>>> be a bit confusing.
>>
>> Well seeing as -e without -o currently does nothing,
>> I don't think we need to worry too much about changing that behavior.
>> Also to me, specifying -e EMPTY implicitly means I want
>> fields missing from one of the files replaced with EMPTY.
>>
>> Note POSIX is more explicit, and describes our current operation:
>>
>> -e EMPTY
>>   Replace empty output fields in the list selected by -o with EMPTY
>>
>> So changing that would be an extension to POSIX.
>> But I still think it makes sense.
>> I'll prepare a patch soon, to do as I describe above,
>> unless there are objections.
> 
> The attached changes `join` (from what's done on other platforms) so that...
> 
> `join -e` will automatically pad missing fields from one file
> so that the same number of fields are output from each file.
> Previously -e was only used for missing fields specified with -o or -j.
> 
> With this change join now does:
> 
> $ cat file1
> a 1 2
> b 1
> d 1 2
> 
> $ cat file2
> a 3 4
> b 3 4
> c 3 4
> 
> $ join -a1 -a2 -1 1 -2 1 -e. file1 file2
> a 1 2 3 4
> b 1 . 3 4
> c . . 3 4
> d 1 2 . .
> 
> $ join -a1 -a2 -1 1 -2 4 -e. file1 file2
> . . . . a 3 4
> . . . . b 3 4
> . . . . c 3 4
> a 1 2 . .
> b 1 .
> d 1 2 . .
> 
> $ join -a1 -a2 -1 4 -2 1 -e. file1 file2
> . a 1 2 . . .
> . b 1 . .
> . d 1 2 . . .
> a . . 3 4
> b . . 3 4
> c . . 3 4
> 
> $ join -a1 -a2 -1 4 -2 4 -e. file1 file2
> . a 1 2 a 3 4
> . a 1 2 b 3 4
> . a 1 2 c 3 4
> . b 1 . a 3 4
> . b 1 . b 3 4
> . b 1 . c 3 4
> . d 1 2 a 3 4
> . d 1 2 b 3 4
> . d 1 2 c 3 4
> 
> While -e without -o was previously a noop, and so could safely be extended 
> IMHO,
> this will also change the behavior when with -e and -j are specified.
> Previously if -j > 1 was specified, and that field was missing,
> then -e would be used in its place, rather than the empty string.
> This still does that, but also does the padding.
> Without the -j issue I'd be 80:20 for just extending -e to auto pad,
> but given -j I'm 50:50.  The alternative it to select this with
> say '-o padded', but that's less discoverable, and complicates
> the interface somewhat.

Considering this more, I think it's safer to auto pad only
when '-o padded' is specified. I notice the plan9 `join` man page
has an example that uses -e '' to explicitly specify the NUL string as filler,
which would have triggered our auto pad if we left it as above.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]