coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cut -DF


From: Pádraig Brady
Subject: Re: cut -DF
Date: Thu, 6 Jan 2022 14:35:16 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Thunderbird/95.0

Thanks for taking the time to consolidate options/functionality
across different implementations.  This is important for users.
Some notes below...

On 05/01/2022 16:23, Rob Landley wrote:
Around 5 years ago toybox added the -D, -F, and -O options to cut:

     -D  Don't sort/collate selections or match -fF lines without delimiter
     -F  Select fields separated by DELIM regex
     -O  Output delimiter (default one space for -F, input delim for -f)

This lets you do:

   $ echo one two three four five six seven eight nine | cut -DF 7,1-3,2
   seven one two three two

-F is a regex version of -f (defaulting to "match a run of whitespace")

-D says to show the raw matches in the order requested (and ONLY those matches,
it doesn't pass through lines with no matches)

-O is -d for output.

Cool. I agree that the functionality is useful,
especially in places where awk may not be available.

As I see it, the main functionalities added here:
  - reordering of selected fields
  - adjusted suppression of lines without matching fields
  - regex delimiter support

I see regex support as less important, but still useful.


You need all three because -F is useful by itself, and -F needs -O because when
you're matching a regex it's not clear what to output. (Does
"echo -e one\ttwo three' cut | -DF 3,1" glue them together with what's before
match 1 (nothing), what's after match 3 (nothing), or an arbitrarily chosen one
of the two different splits in between?)

Elliott Hughes (the Android base OS maintainer) asked if I could get the feature
more widely adopted:

   http://lists.landley.net/pipermail/toybox-landley.net/2021-June/012453.html

your non-POSIX cut(1) extension covers 80% of the in-the-wild use of awk
anyway :-) if you still talk to any of the busybox folks, we should suggest
they copy that --- it would be nice for it to be a de facto standard so we
can get it into POSIX sometime around the 2040s... (and have made lives
better for the folks who don't care about standards and just want to "get
things done" in the intervening decades!)

So I offered to implement it in busybox:

   http://lists.busybox.net/pipermail/busybox/2021-June/088886.html

And the busybox maintainer merged it here:

   https://git.busybox.net/busybox/commit/?id=0068ce2fa0e3

This is working and in use in Android, and now in busybox, and it would simplify
my regression test suite if coreutils was in sync, so I thought I'd ask if you
were interested.

Thanks,

Rob

P.S. Somebody submitted a proposal to do this to posix way back when (see end of
rationale at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/cut.html
-- they replied that they only standardize existing features, not take
suggestions for new ones nobody's implemented yet). If I'd noticed I'd have used
-o instead of -D at the time, but whoever suggested it apparently didn't try to
make it actually work because -F is useful without -D, and -F without -O isn't
well-defined.

P.P.S. -D implying -F doesn't help because -F is the one that takes arguments,
analogous to -f.

As for the interface, it's a bit surprising that -F wasn't used to
switch the field handling mode, rather than -D. I.e. I see the mode
change more pertaining to field handling, rather than delimiter handling.
I don't have a strong opinion on this, but it may be a bit confusing to users.

BTW it's useful to note existing edge cases in delimiter handling
when considering this new interface:
https://www.pixelbeat.org/docs/coreutils-gotchas.html#cut

So to summarize the new interface, and how it might map to / be described in 
coreutils,
I see it as:

  -D,--matched-fields
    Use the field order specified, and suppress lines without matching fields
  -F, --regex-fields=LIST
    Like -f, but interpret -d as a regular expression (defaulting
    to a run of whitespace)
  -O,--output-delimiter=STRING
    use STRING as the output delimiter
    default is one space with -F, or input delimiter for -f

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]