bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6780: Problem with the cut command


From: Bob Proulx
Subject: bug#6780: Problem with the cut command
Date: Mon, 2 Aug 2010 14:30:24 -0600
User-agent: Mutt/1.5.18 (2008-05-17)

tags 6780 + wishlist
retitle 6780 Add cut multi-character/expression delimiters
thanks

Bill wrote:
> I'm not sure if this is a bug, a question or a feature request,
> but there is a problem with the cut command, specifically with
> it's delimiter option '-d'. 
> 
> In older times disk space was scarce and every byte was 
> conserved. Fields in data files were delimited with a single
> character such as ':'. This practise continues today. But 
> sometimes it does not and fields in some files are separated
> with multiple characters. Space is no longer precious.

Sure.  But I think none of that is relevant to changing stable program
interfaces and behavior.  That is a good point for creating a new
program that has no legacy however.  The world is wide open for adding
new programs.  Feel free to go for it there.

> Suppose I wish to import information about a disk partition
> into my backup script. I want to assign the type of filesystem
> to a variable. Compare the output of these two commands.
> 
> cat /etc/fstab |grep home | cut -d ' ' -f3
> yields a blank output line

It is data dependent.  The output depends upon what you have as input.
For some files it would be one way and for others a different way.
But that just points out that using cut is the wrong tool for the
task.  As you are well aware of by your note cut works with single
character delimiters.  But the fstab may have multiple whitespace.
This makes cut an inappropriate tool for the job.

> cat /etc/fstab |grep opt | awk -F " " '{print $3}'
> yields the desired output - reiserfs.

Awk is a much better tool for the task.  But the inefficiencies
present in that command line are many.  There are much better ways.
Try this instead.

  awk '/opt/{print$3}' /etc/fstab

However that doesn't account for comments that may also match.  To
avoid problems comments should be removed first.

  awk '/#/{gsub("#.*","")}/opt/{print$3}' /etc/fstab

And I am inclined to say that it is better to just match on a
particular field.

  awk '/#/{gsub("#.*","")}$2=="/opt"{print$3}' /etc/fstab

> The problem is that the cut command can't handle multiple 
> instances of the same delimiter. It's designed to handle
> a single character like ':', but can't cope with repeating
> characters like '::' or a series of spaces as in /etc/fstab.

All correct.  The cut command is not the appropriate tool for your
task.

> So my question is shouldn't the cut delimiter handle 
> multiple instances of the same character internally or 
> failing that, shouldn't there be some way of specifying a 
> series of single delimiter characters such as -d':'+  ?

In my opinion no, it should not.  It is feature creep and code bloat.
Cut is not just used on large servers and large desktops but also on
wristwatches and toaster ovens.  Should the increase in size be
multipled by every system in the known universe?  And even if this
feature were added to cut the program would still be insufficient to
the task since it has no capability to handle comments nor line
selection (although your combination with grep is fine with me, good
in fact though sed would be better since it enables checking return
status).  Furthermore the feature is already implemented and fully
supported by awk.  Using awk is a much better fit than using cut.  The
solution already exists in awk and therefore is not needed in cut.
The awk program is standardized and portable.  To me awk is the best
in class tool for this task.

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]