bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq - tab delimited output (feature req)


From: Bob Proulx
Subject: Re: uniq - tab delimited output (feature req)
Date: Wed, 11 Oct 2006 22:54:34 -0600
User-agent: Mutt/1.5.9i

Patrick Tufts wrote:
> My apologies if this email address is for bugs only. This is a feature
> request (but I feel that the lack of the feature approaches bug status).

This is a feature discussion list as well.  I just made that more
clear on the coreutils web page.

  http://www.gnu.org/software/coreutils/

> uniq should have an option so it create tab delimited output instead of
> space delimited.
> 
> Rationale:
> 
> The coreutils are often piped together. In the following example,
> foo.txt is a tab delimited file:
> 
> cut -f 1,2 foo.txt | sort -k 2,2 | uniq -c > bar.txt
> 
> A subsequent sort or join operation on bar.txt may not pick up on fields
> correctly, if the fields in foo.txt contain spaces. I can specify the
> field separator for sort or join to remove the ambiguity, but this only
> works if the delimiter is consistent. 

Unfortunately your example produces output from uniq but stops short
of of an example where you are using the output from it.  Could you
provide an example where you would be daisy-chaining 'uniq -c' into a
subsequent filter command?  I am unable to think of a good example of
this off of the top of my head but I am sure you have several good
example cases available to draw from.

> uniq -c introduces an inconsistency because it creates space
> delimted output. This is an inconsistency unique to uniq -c among
> the coreutils. No other coreutil that I've used, including uniq
> without the "-c", have this behavior.

I am not sure I agree with that statement.  Really the tab delimited
filters really seem the odd ones.  Many of those filters were cobbled
together quickly and only through longevity reached standardization to
the point that they don't change any more.  Thirty years is a long
time for any software program to survive, let alone thrive.

It is interesting to note that the V7 join manual said this in the
bugs section:

    The conventions of join, sort, comm, uniq, look and awk(1) are
    wildly incongruous.

But none of those required tab delimited fields.

> This special case behavior often trips me up, and I suspect it does so
> for other users as well. I find myself writing shims to turn uniq -c
> output into tab delimited output just to make uniq work with other
> coreutils (sort, join, cut).

But sort and join both work on whitespace delimited fields, not tab
delimited fields.  The cut program is the only one with a default
delimiter of tab, and then only for the field option.  I dare say that
the primary purpose of cut is to cut by column.

> It seems that this would be better handled if uniq had a flag to specify
> the column separator on a uniq -c
> 
> Perhaps something like sort's -t flag, except used to specify the output
> seperator.

That does not seem unreasonable.  If you could provide some example
cases where use of that option would useful I think it would help to
convince the maintainers of its usefulness.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]