bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature request for cut and/or sort


From: Bob Proulx
Subject: Re: Feature request for cut and/or sort
Date: Mon, 23 Jul 2007 06:37:07 +0200
User-agent: Mutt/1.5.9i

The Wanderer wrote:
> (&*%&*!@ incorrect reply behaviour... I *hate* having to type out the
> address by hand in every post.)

Since you are talking to someone who *strongly* opposes munging the
Reply-To: header your distress is noted as theatrical and also being
self-inflicted since you could easily do a group-followup-to-all.  :-/

> Bob Proulx wrote:
> >The Wanderer wrote:
> How about for sort?

Hmm...  sort...  I guess you just have to count the fields. (shrug)

> >That is correct.  But cut is really not the best tool for the job.
> 
> In that case, it probably is not the best tool for the job of cutting
> from the beginning of the line, either. Are there any other than
> historical reasons ("it's become standard, and people expect it to be
> there") for not removing it entirely?

I would happily do without 'cut' but after thirty years of having it
around is there any reason not to simply leave it?  As mom says, don't
poke at it.

> (Naturally, I don't want it removed. The point is that the fact that
> there are better tools available for a given task does not inherently
> mean it is not worthwhile to have "worse" tools capable of performing
> the same task.)

But in the case of awk the number of characters to type to use it and
the "obviousness" of how it is used is quite good.  And it is also
very useful to know how to use so the effort spent for the basic usage
pays itself back in increased functionality very quickly.

> >Instead I recommend and use awk for these types of things.
> >
> >  echo /path/to/somefile | awk -F/ '{print$NF}'
> 
> I've never had occasion (or, beyond the general availability Somewhere
> of documentation, opportunity) to learn awk. In any case a program to
> which I have to pass an esoteric incantation is less convenient than one
> to which I can simply pass arguments to simple options.

I think I will simply have to agree to disagree.  In basic usage awk
is quite simple to learn and use and well worth learning.  I rarely
write long awk programs anymore though because I prefer ruby, and
before that perl, so much more.  These days my use of awk is mostly
one-liner commands of the basic 'grep' and 'column printing' type.

> How well will these work in cases where there is other, extraneous data
> on the line before the path begins?

Examples please?  Bilbo's riddle is quite a bit too challenging for me
today.

But pretty much in any extensive case I would use 'sed' to do any
complicated stream editing.  Again, it is very standard and would work
on any posix platform.  So I will jump ahead and suggest it.

> (Regardless, the 'find' solution will not work in most of the cases I
> see, because the program which outputs the data I need to parse does not
> and cannot be made to null-terminate.)

The 'find' example was merely to generate a largish number of
pathnames quickly and easily and without a lot of setup.

> >You say that you want the second from the end?  Subtract the number
> >from the end.
> >
> >  $ echo /one/two/three/four | awk -F/ '{print$(NF-1)}'
> >  three
> 
> How about the final two, including their separator?

  $ echo /one/two/three/four | awk -F/ '{print$(NF-1)"/"$NF}'
  three/four

On the command line I jam everything together without spaces just like
an old Fortran programmer.  But in scripts I do use whitespace as
appropriate to increase readability.  This might be easier to read.

  $ echo /one/two/three/four | awk -F/ '{ print $(NF-1) "/" $NF }'
  three/four

> (I kind of expect a "RTFM awk" here. The point, however, is that this is
> much less convenient and intuitive to *find out about* - much less to
> use - than are the options to cut.)

I would rather make it attractive to you such that you *want* to learn
more about it.  But I can see that I have already failed.

Language shapes the way people think[1].  If the only tool available
is a hammer then all problems look like nails.  If the only tools
available to you are 'cut' then sure expand it infinitely.  But then
eventually 'cut' will look like Ada or C++ with features solely
because their absence would offend someone.

Fortunately shell programmers have a rich set of tools available and
are not limited to simply using 'cut'.  Tools such as 'awk' and 'sed'
and others are too good to want to avoid learning.  Personally I enjoy
learning about alternative ways to do things.  It makes programming a
joy and not a drudgery.

> This does not by itself address doing the same thing with programs other
> than cut, though the much more terse message from Andreas Schwab seems
> to indicate that there are ways to accomplish it generically, but it
> does provide at least minimal incentive for me to attempt to learn awk.
> (I'm having enough trouble with sed and bash, and haven't improved at C
> in much of a decade - attempting to gain reflexive-recall mastery of
> another language is not a pleasant prospect...)

At least in the one-liner form it lends itself to use very quickly.
This covers 99.44% of everything you need to know.

  awk '/RE-pattern/{print $NUMBER}'

> >Nah...  Just use awk.  It is standard, portable and already does what
> >you ask along with many more features.  The syntax of doing this
> >with awk is quite obvious.  It is short and quick to type when doing
> >it on the command line.
> 
> The same argument could, presumably, be provided for the functions for
> which cut does provide options. The advantages of having a separate
> utility appear to be that it is more convenient to use quickly and is
> easier to discover and learn.

I was not opposed to seeing fields-from-the-end added to cut.  I was
opposed to using cut at all.  :-)

> I do consider myself a comparatively advanced user (having built my own
> system from parts and administered it more or less independently for
> years), although I am nowhere near anything like mastery yet, and I
> still find the (reputedly both complex and powerful) languages whose use
> seems to be the standard response to requests for enhancement to these
> tools to be intimidating.

Hmm...  But if a complex tool intimidates and by being too complex
prevents people from using them then keeping the core utilities simple
should be a prime guideline so that they remain usable in the future.
If there is no resistance to adding features then the utilities would
become very cluttered and would become one of those complex programs
to which you are now raising as objectionable by being too complex.

The 'cp' and 'rsync' programs come to mind.  I know that you are
talking about cut and not cp but it is illustrative of the issue.
Often people ask for features in cp that already exist in rsync.  The
rsync program has a long list of features and continues to evolve.
Right now most people do not use rsync when they simply want to copy a
file from here to there.  Why not?  It is perfectly capable of the
task.  And it will do a zillion other tasks too.

I think many would answer because rsync is too complex for that task
or too bloated or too slow or other answers indicating that they used
cp because it was simple, direct and to the point.  What program would
people be able to use if cp became the same as rsync?  There would in
that case no longer be a simple program to fall back upon.

Note that while I am using rsync as a good example of a program that
has evolved to become a very complex and intimidating one through
addition of many features that I also really like rsync.  Often it is
the perfect tool for the job.

> Expecting a more basic user - who may have been lucky in stumbling
> across e.g. cut or sort at all - to spend the time to learn them,
> when these capabilities seem natural extensions of the abilities the
> tools already have, seems to me like at best a dubious and at worst
> a damagingly elitist position.

Arguably programs such as awk and such are part of the set of shell
programming programs that every shell programmer should know.  By
suggesting awk (and at other times find-xargs and such) I am not
trying to be elitist but simply trying to pass along knowledge of the
rich set of possible tools in common use for shell programming.  The
problem is that shell scripting is not a closed system.

The biggest issue I have with cut in practical use is the rigid
definition of fields separated by single TAB characters.  Awk, Perl,
Ruby, all have a more liberal definition of fields separated by
whitespace.  (Perl was not always that way but evolved into it.  The
perl 'split' default with no arguments was changed and was like cut in
the old days and now is like awk.)

  $ echo "  one  two three" | cut -d' ' -f3
  one

  $ echo "  one  two three" | awk '{print$3}'
  three

Because in 'cut' the fields are separated by the delimiter the above
example has two empty fields separated by spaces.  Perfectly valid but
not usually what is wanted.

Bob

[1] Man's thought is shaped by his tougue.
    http://en.wikipedia.org/wiki/Language_and_thought




reply via email to

[Prev in Thread] Current Thread [Next in Thread]