bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash "extglob" needs to upgrade at least like zsh "kshglob"


From: Chet Ramey
Subject: Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
Date: Wed, 16 Nov 2022 16:47:20 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 11/14/22 6:06 AM, Koichi Murase wrote:

However, I noticed two strange behaviors of the current engine.
Before adjusting the behavior of the new engine and submitting it for
review, I would like to confirm the (expected) behavior of the current
engine in the current devel.

These two behaviors finally turned out to be both related to the
matching of bracket expression by the function `BRACKMATCH'
(lib/glob/sm_loop.c).

----------------------------------------------------------------------
1. pattern [[=B=]][c] matches with c

   $ bash-devel --norc
   $ [[ Bc == [[=B=]][c] ]]; echo $?
   0      <-- OK. This is expected.
   $ [[ c == [[=B=]][c] ]]; echo $?
   0      <-- This is unexpected.

This is clearly a problem, and your fix is a good one.


----------------------------------------------------------------------
2. bracket expression sometimes match or unmatch the slash with
   FNM_PATHNAME.

   FNM_PATHNAME is only used in two places in the Bash codebase.  1)
   One is for the glob matching for the filenames in the directory
   (lib/glob/glob.c).  However, this actually does not seem to have an
   actual effect because FNM_PATHNAME only causes differences in the
   matching when the target string contains a slash but the filenames
   do not contain any slashes.  2) The other is the filtering of the
   pathname-expansion results with GLOBIGNORE (pathexp.c).  So the only
   way to test the behavior of Bash's strmatch with FNM_PATHNAME
   (without writing a C program to directly use the function
   `strmatch') is to use GLOBIGNORE.

This is true, but the two uses above are not directly analogous to
fnmatch(). The bash implementation, due to its origin as an fnmatch()
implementation, uses the same flag names but uses FNM_PATHNAME to signal
the matching engine that it's matching filenames for pathname expansion.

That triggers the difference. The first part of your patch is correct:
a bracket expression can never match a slash in the string. This is
common to both fnmatch and what POSIX calls "Patterns Used For Filename
Expansion."

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_03

fnmatch with FNM_PATHNAME only has to avoid matching the slash with a
bracket expression. The shell has an additional constraint: a slash that
appears in a bracket expression renders the bracket expression void and
requires the `[' to be matched explicitly. That's why there have to be
tests for slash in BRACKMATCH. There are two bugs: the off-by-one error
you note and matching the open bracket explicitly.


   * In fnmatch(3), a bracket expression never matches a / with
     FNM_PATHNAME.

This is one part of the fix.
   * In Bash's strmatch, a bracket expression sometimes matches `/' and
     sometimes does not.  In the codebase, `/' is excluded only when it
     explicitly appears after another character in the bracket
     expression (lib/glob/sm_loop.c:574) even though the comment
     mentions [/].  This is the reason that only [a/] fails with Bash's
     implementation in cases #2..#6 in the above result.

That's the consequence of the test being in the wrong place in BRACKMATCH.


   * What is happening with Bash's implementation in cases #7..#12 is
     related the assumption of the backtracking trick for `*': The
     trick for `*' backtracking explained in the code comment
     lib/glob/sm_loop.c:320---"This is a clever idea from
     glibc"---relies on the behavior that the bracket expression never
matches a slash that `*' cannot match.

These two changes should satisfy that.

The cleverness is due to Russ Cox, a really smart guy who figured this
stuff out first:

https://research.swtch.com/glob

(https://swtch.com/~rsc/regexp/ is a collection of his writing on regular
expressions. It's well worth reading.)

I attached the patch I applied. I didn't include your fix to issue 1 above.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: bracket-slash.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]