Re: Splitting search results from a "find -print0"

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting search results from a "find -print0"

From:	Pádraig Brady
Subject:	Re: Splitting search results from a "find -print0"
Date:	Fri, 09 Jan 2015 12:29:23 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 09/01/15 07:41, Bernhard Voelker wrote:
> On 01/09/2015 03:16 AM, Pádraig Brady wrote:
>> diff --git a/src/split.c b/src/split.c
>> index ef672f4..f9e651d 100644
>> --- a/src/split.c
>> +++ b/src/split.c
> 
> ...
> 
>> @@ -1303,6 +1307,33 @@ main (int argc, char **argv)
>>            unbuffered = true;
>>            break;
>>  
>> +        case 't':
>> +          {
>> +            char neweol = optarg[0];
>> +            if (! neweol)
>> +              error (EXIT_FAILURE, 0, _("empty record separator"));
>> +            if (optarg[1])
>> +              {
>> +                if (STREQ (optarg, "\\0"))
>> +                  neweol = '\0';
>> +                else
>> +                  {
>> +                    /* Provoke with 'split -txx'.  Complain about
>> +                       "multi-character tab" instead of "multibyte tab", so
>> +                       that the diagnostic's wording does not need to be
>> +                       changed once multibyte characters are supported.  */
>> +                    error (EXIT_FAILURE, 0, _("multi-character separator 
>> %s"),
>> +                           quote (optarg));
>> +                  }
>> +              }
>> +            /* Make it explicit we don't support multiple separators.  */
>> +            if (0 <= eolchar && neweol != eolchar)
>> +              error (EXIT_FAILURE, 0, _("incompatible record separator"));
> 
> I don't see why we should forbid "split -t a -t b". Shouldn't we just
> let the latter win as for other tools' options?

This code is mostly copied from the join -t processing.
I was wondering myself and added this comment:
/* Make it explicit we don't support multiple separators.  */
I thought that slightly more useful than having a
default `split -ta` that could be overridden by `split -tb`.

> If we go with something like above: I think "incompatible record separator"
> is not very descriptive.  I'd rather forbid multiple -t options at all and
> change the error diagnostic accordingly.

Agreed on the message.  I'll change to:
_("multiple separator characters specified")

>> diff --git a/tests/local.mk b/tests/local.mk
>> index 6fc8599..14dfaf3 100644
>> --- a/tests/local.mk
>> +++ b/tests/local.mk
>> @@ -355,6 +355,7 @@ all_tests =                                      \
>>    tests/split/b-chunk.sh                    \
>>    tests/split/fail.sh                               \
>>    tests/split/lines.sh                              \
>> +  tests/split/lines-sep.sh          \
>>    tests/split/line-bytes.sh                 \
>>    tests/split/l-chunk.sh                    \
>>    tests/split/r-chunk.sh                    \
> 
> The line continuation character \ is misaligned with the others.

Fixed that and a couple of other misalignments.

> BTW: Why is it we don't use either a single or multiple blanks before
> the line continuation character here (instead of TABs) anyway?
> This always looks distracting.

It's often handier to align with tabs, and given there doesn't
need to be a mixture of tabs and spaces here, it's immune to
variant tab width settings, which leading tabs in source is not.

>> diff --git a/tests/split/lines-sep.sh b/tests/split/lines-sep.sh

>> +  split --lines=2 "$@" in1-$suf x$num- > out-$suf || return 1
> 
> ouch, we only check lines_split() with this while other split_types
> should be checked, too.

Good point. I've rewritten the test to avoid the test number
and suffixes, and loop over separators and relevant modes.

>> +# should fail: different separators used, including default
>> +{ split -t"$NL" -tb </dev/null >/dev/null 2>&1 || test $? -ne 1; } &&
>> +  { warn_ "-t\$NL -tb did not trigger an error" ; fail=1 ; }
> 
> Is it necessary to check against exit(1) in the above?
> We don't in the following (positive case) either ...

I was worried we wouldn't catch a seg fault with the above.
I've a patch pending to adjust other occurrences of
the command && fail=1 idiom too.

>> +# should not fail: same separator used multiple times
>> +split -t: -t: </dev/null >/dev/null 2>&1 ||
>> +  { warn_ "-t: -t: triggered an error" ; fail=1 ; }

seg fault is implicitly protected against here.

thanks for the careful review!

Latest version attached.

Pádraig.

split-t.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Splitting search results from a "find -print0", (continued)

Prev by Date: Re: Splitting search results from a "find -print0"
Next by Date: Re: Splitting search results from a "find -print0"
Previous by thread: Re: Splitting search results from a "find -print0"
Next by thread: Re: Splitting search results from a "find -print0"
Index(es):
- Date
- Thread