Re: Splitting search results from a "find -print0"

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Splitting search results from a "find -print0"

From:	Bernhard Voelker
Subject:	Re: Splitting search results from a "find -print0"
Date:	Fri, 09 Jan 2015 08:41:20 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 01/09/2015 03:16 AM, Pádraig Brady wrote:
> diff --git a/src/split.c b/src/split.c
> index ef672f4..f9e651d 100644
> --- a/src/split.c
> +++ b/src/split.c

...

> @@ -1303,6 +1307,33 @@ main (int argc, char **argv)
>            unbuffered = true;
>            break;
>  
> +        case 't':
> +          {
> +            char neweol = optarg[0];
> +            if (! neweol)
> +              error (EXIT_FAILURE, 0, _("empty record separator"));
> +            if (optarg[1])
> +              {
> +                if (STREQ (optarg, "\\0"))
> +                  neweol = '\0';
> +                else
> +                  {
> +                    /* Provoke with 'split -txx'.  Complain about
> +                       "multi-character tab" instead of "multibyte tab", so
> +                       that the diagnostic's wording does not need to be
> +                       changed once multibyte characters are supported.  */
> +                    error (EXIT_FAILURE, 0, _("multi-character separator 
> %s"),
> +                           quote (optarg));
> +                  }
> +              }
> +            /* Make it explicit we don't support multiple separators.  */
> +            if (0 <= eolchar && neweol != eolchar)
> +              error (EXIT_FAILURE, 0, _("incompatible record separator"));

I don't see why we should forbid "split -t a -t b". Shouldn't we just
let the latter win as for other tools' options?

If we go with something like above: I think "incompatible record separator"
is not very descriptive.  I'd rather forbid multiple -t options at all and
change the error diagnostic accordingly.

> diff --git a/tests/local.mk b/tests/local.mk
> index 6fc8599..14dfaf3 100644
> --- a/tests/local.mk
> +++ b/tests/local.mk
> @@ -355,6 +355,7 @@ all_tests =                                       \
>    tests/split/b-chunk.sh                     \
>    tests/split/fail.sh                                \
>    tests/split/lines.sh                               \
> +  tests/split/lines-sep.sh          \
>    tests/split/line-bytes.sh                  \
>    tests/split/l-chunk.sh                     \
>    tests/split/r-chunk.sh                     \

The line continuation character \ is misaligned with the others.
BTW: Why is it we don't use either a single or multiple blanks before
the line continuation character here (instead of TABs) anyway?
This always looks distracting.

> diff --git a/tests/split/lines-sep.sh b/tests/split/lines-sep.sh
> new file mode 100755
> index 0000000..74857fe
> --- /dev/null
> +++ b/tests/split/lines-sep.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +# test split with custom record separators
> +
> +# Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
> +print_ver_ split
> +
> +# Prepare input/expected-output files,
> +# with newline, zero, colon line-separators.
> +printf '1\n2\n3\n4\n5\n' > in1-nl || framework_failure_
> +printf '1\n2\n' > exp1-nl || framework_failure_
> +printf '3\n4\n' > exp2-nl || framework_failure_
> +printf '5\n'    > exp3-nl || framework_failure_
> +
> +printf '1\0002\0003\0004\0005\000' > in1-z || framework_failure_
> +printf '1\0002\000' > exp1-z || framework_failure_
> +printf '3\0004\000' > exp2-z || framework_failure_
> +printf '5\000'      > exp3-z || framework_failure_
> +
> +printf '1:2:3:4:5:' > in1-cln || framework_failure_
> +printf '1:2:' > exp1-cln || framework_failure_
> +printf '3:4:' > exp2-cln || framework_failure_
> +printf '5:'   > exp3-cln || framework_failure_
> +
> +
> +run_split()
> +{
> +  # test number (should be unique, to avoid output file dups)
> +  num=$1

Why not define _num here and 'let _num++' instead of having to care
about it when calling (see below)?

> +  # suffix of test files (nl/z/cln)
> +  suf=$2

Is the suffix relevant? run_split could hide this detail as well.

> +  # other parameters (if any) are passed to 'split'
> +  shift 2
> +
> +  split --lines=2 "$@" in1-$suf x$num- > out-$suf || return 1

ouch, we only check lines_split() with this while other split_types
should be checked, too.

> +
> +  compare exp1-$suf x$num-aa || return 1
> +  compare exp2-$suf x$num-ab || return 1
> +  compare exp3-$suf x$num-ac || return 1
> +  test -f x$num-ad && return 1
> +
> +  return 0
> +}
> +
> +NL='
> +'
> +
> +# Test newline, without '-t' option (the default)
> +run_split 1 nl         || { warn_ "test 1 failed" ; fail=1 ; }
> +
> +# Test newline specified as custom record separator
> +run_split 2 nl -t"$NL" || { warn_ "test 2 failed" ; fail=1 ; }
> +
> +# Test null line-separator with '-t'
> +run_split 4 z -t'\0'   || { warn_ "test 4 failed" ; fail=1 ; }

run_split 3 ... ?
... or care inside run_split about it as proposed above.

> +
> +# Test non-default line-separator with '-t'
> +run_split 5 cln -t:    || { warn_ "test 5 failed" ; fail=1 ; }

Likewise:
run_split 4 ... ?

> +
> +
> +#
> +# Test usage edge cases
> +#
> +
> +# Should fail: '-t' requires an argument
> +{ split -t </dev/null >/dev/null 2>/dev/null || test $? -ne 1; } &&
> +  { warn_ "-t without argument did not trigger an error" ; fail=1 ; }
> +
> +# should fail: multi-character separator
> +{ split -txx </dev/null >/dev/null 2>&1 || test $? -ne 1; } &&
> +  { warn_ "-txx did not trigger an error" ; fail=1 ; }
> +
> +# should fail: different separators used
> +{ split -ta -tb </dev/null >/dev/null 2>&1 || test $? -ne 1; } &&
> +  { warn_ "-ta -tb did not trigger an error" ; fail=1 ; }
> +
> +# should fail: different separators used, including default
> +{ split -t"$NL" -tb </dev/null >/dev/null 2>&1 || test $? -ne 1; } &&
> +  { warn_ "-t\$NL -tb did not trigger an error" ; fail=1 ; }

Is it necessary to check against exit(1) in the above?
We don't in the following (positive case) either ...

> +
> +# should not fail: same separator used multiple times
> +split -t: -t: </dev/null >/dev/null 2>&1 ||
> +  { warn_ "-t: -t: triggered an error" ; fail=1 ; }
> +
> +
> +
> +Exit $fail
> -- 2.1.0

Have a nice day,
Berny

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Splitting search results from a "find -print0", (continued)

Prev by Date: Re: Splitting search results from a "find -print0"
Next by Date: Re: Splitting search results from a "find -print0"
Previous by thread: RE: Splitting search results from a "find -print0"
Next by thread: Re: Splitting search results from a "find -print0"
Index(es):
- Date
- Thread