coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[coreutils] join feature: auto-format


From: Assaf Gordon
Subject: [coreutils] join feature: auto-format
Date: Wed, 06 Oct 2010 16:41:09 -0400
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100918 Icedove/3.1.4

Hello,

I'd like to (re)suggest a feature for the join program - the ability to 
automatically build an output format line (similar but easier than using "-o").

I've previously mentioned it here (but got no favorable responses):
http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00151.html

Several people have been using this option for a year now (on our local 
servers), so I thought I might try to suggest it again.

The full patch is attached, and also available here:
http://cancan.cshl.edu/labmembers/gordon/files/join_auto_format_2010_10_06.patch

Here's the common use case:

Given two tabular files, with a common key at first column, and many numeric 
(or other) values on other columns, the user wants to join them together easily.
One requirement is that empty/missing values should be populated with "00".

File 1
======
bar 10 13 15 16 11 32
foo 10 10 11 12 13 14


File 2
======
bar 99 91 90 93 91 93
baz 90 91 99 96 97 95


Desired joined output
==============
bar 10 13 15 16 11 32 99 91 90 93 91 93
baz 00 00 00 00 00 00 90 91 99 96 97 95
foo 10 10 11 12 13 14 00 00 00 00 00 00

There is no technical problem in achieving this, the parameters would be:
"-a1 -a2 -e 00 -o 0,1.2,1.3,1.4,1.5,1.6,1.7,2.2,2.3,2.4,2.5,2.6,2.7"

But building the "-o" parameter is cumbersome, and error-prone (imaging files 
with dozens of columns, which is very common in my case).

The "--auto-format" feature simply builds the "-o" format line automatically, 
based on the number of columns from both input files.
The auto-generated format order is: Key-column, all columns (except key) from 
first file, all columns (except key) from second file.

The parameters for the above use case become:
"-a1 -a2 -e 00 --auto-format"

If "--auto-format" is not specified, there's no change to the rest of the 
workflow.
If both "--auto-format" and "-o XXXX" are specified, the "-o" takes precedence.
Let me know what you think about it. 

Please let me know what you think about it.
Best regards,
 -gordon

Attachment: join_auto_format_2010_10_06.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]