coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cut -DF


From: Rob Landley
Subject: Re: cut -DF
Date: Fri, 7 Jan 2022 12:34:50 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0


On 1/6/22 5:02 PM, Assaf Gordon wrote:
> Hello,
> 
> On 2022-01-06 7:35 a.m., Pádraig Brady wrote:
>> Thanks for taking the time to consolidate options/functionality
>> across different implementations.  This is important for users.
>> Some notes below...
>> 
>> On 05/01/2022 16:23, Rob Landley wrote:
>>> Around 5 years ago toybox added the -D, -F, and -O options to cut:
>>>
>>>      -D  Don't sort/collate selections or match -fF lines without 
>>> delimiter
>>>      -F  Select fields separated by DELIM regex
>>>      -O  Output delimiter (default one space for -F, input delim for -f)
>>>
>> 
>> As I see it, the main functionalities added here:
>>    - reordering of selected fields
>>    - adjusted suppression of lines without matching fields
>>    - regex delimiter support
>> 
>> I see regex support as less important, but still useful.
>> 
> 
> 
> Attached is a suggestion for initial implementation of "cut -FDO".
> It's split into smaller steps to ease review.
> 
> The main issue is that the current "cut_fields" and "cut_bytes" are
> highly optimized for speed, so I left them as-is and created a secondary
> set of 'cut' functions - slower but with additional options.

There was a whole special case -d$'\n' in busybox to cut by line that I haven't
found any documentation for, and it looks like that was copied from coreutils...

$ echo -e 'one\ntwo\nthree\nfour\nfive' | cut -d$'\n' -f 2-3
two
three

So I'm guessing there's already more than one codepath. :)

> If this is acceptable, I'll go on to clean up the patches, add more
> tests and write documentation.
> 
> There are likely some edge-cases regarding regex matching that need to 
> be decided upon (e.g. BRE or ERE, what about BOL/EOL anchors, groups, etc.).

Toybox is doing ERE by default because it was introduced post-y2k:

  https://github.com/landley/toybox/blob/0.8.6/toys/posix/cut.c#L217

And ignoring BRE/ERE:

  https://github.com/landley/toybox/blob/0.8.6/toys/posix/cut.c#L140

because I don't see how BOL/EOL applies to delimiters _between_ elements? (Any
delimiter between first element or after last element would mean another empty
element at the edge?)

Busybox inherited both behaviors.

Thanks,

Rob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]