coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "expr" won't match empty strings


From: Luke Kendall
Subject: Re: "expr" won't match empty strings
Date: Sun, 03 Aug 2014 02:48:51 +1000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.24) Gecko/20101027 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666

On 02/08/14 20:40, Pádraig Brady wrote:
On 08/02/2014 06:03 AM, Luke Kendall wrote:
I'm hesitant to report this, but I think it's an actual bug in expr that's been 
there from day one.

I believe that expr, when used to match regular expressions, should use the 
success/failure of the pattern match to determine the exit code.

But instead, I believe "expr" uses the length of the matched string to 
determine its exit code.  So when the regexp correctly matches an empty string, expr 
returns failure, despite the match.  Here's a simple example:

$ expr " " : "^ *$" && echo Matched.
1
Matched.
$ expr "" : "^ *$" && echo Matched.
0

And compare that to what sed and grep do:


$ echo "" | sed -n 's/^,*$/& - yep/p'
  - yep

$ printf "a\n\n" | grep '^$' && echo "A match."

A match.

I'd like to suggest that expr be changed to use the success/fail of the pattern 
match to determine the exit status, as all the other unix tools do.

I don't think this alteration of semantics would break many existing scripts, 
for two reasons:
1) It must be unusual to use regexps that can match an empty string, because 
expr does not report a match for that corner case, so to correctly handle it, 
the user must have had to add an explicit test for the input string being 
empty: and this will still work (it's just that with the suggested change, that 
extra code becomes redundant).
2) Based on my own experience, it's unusual to use expr ":" with patterns that 
can match the empty string - it's taken me over 30 years to notice this oddity!

If you think this would be a good change, but don't have time to do anything, 
let me know and I'll have a go and submit a patch.

The exit status of expr is a common gotcha:

$ expr 2 - 1; echo $?
1
0

$ expr 2 - 2; echo $?
0
1

That's a good example: and it makes sense, as by definition, expr's exit status is 'error' (I really mean 1), if the arithmetic expression yields 0.

$ expr ' ' : '^ *$'; echo $?
1
0

$ expr '' : '^ *$'; echo $?
0
1

POSIX states that exit status of 1 is used if "the expression evaluates to null or 
zero".

I guess the POSIX definition really means "the null string" when it says "null", when it defines the exit status. (That wasn't obvious to me on 1st or 2nd reading, actually.)

This definition of the exit status for the pattern match is the oddity: it does not allow the user to distinguish between a successful match of a null string from a failed pattern match.

It seems a bad design decision to have chosen to use the matched string being null as exit status 1, rather than the failure of the match.

In this case even though it is a match, the expression does evaluate to zero,
which is awkward, though conformant to POSIX (and solaris and FreeBSD FWIW).

True, though it's counter to all other regexp evaluations to state that a successful match returns a null or zero, since True is normally equated with a non null and non-zero quantity. (The actual matched substring is usually a side value.)

I do understand that everyone has faithfully implemented the logic of using the length of the match instead of the success of the match.


Though I'm not sure we can change that, which would essentially
be changing the handling of the '*' in the expression. Consider:

   printf '%s\n' 1 2 '' 3 |
   while read line; do
     expr "$line" : '^[0-9]*$' >/dev/null || break # at first blank line
     echo process "$line"
   done

It would not be changing the definition of the '*' in the expression.

The above is using a regexp which would not do the same thing if used in sed or grep (etc. etc.), because it is matching 0 or more repetitions of a digit. It only happens to work this way in expr because of the expr oddity.

You can choose the regexp which *will* work in those utilities *and* expr, too, to get the same termination on the empty line, but this time because the pattern match genuinely fails:

  printf '%s\n' 1 2 '' 3 |
  while read line; do
    expr "$line" : '^[0-9][0-9]*$' >/dev/null || break # at 1st blank line
    echo process "$line"
  done


BTW, using a leading ^ in the expression is redundant and non portable

I know, I just used it to make it clearer that the same expression in other utilities has the natural interpretation.

AFAIK, it's only expr that uses the length of the matched pattern instead of the success of the match as the exit status.

Frankly, to me it looks like a long-standing design error, but if that's the definition, well, so be it I guess!

thanks,
Pádraig.

Regards,

luke




reply via email to

[Prev in Thread] Current Thread [Next in Thread]