bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DOC] Incomplete explanation about the regex =~ operator


From: kevin
Subject: Re: [DOC] Incomplete explanation about the regex =~ operator
Date: Thu, 17 Jan 2019 07:57:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Le 17/01/2019 à 07:53, kevin a écrit :
Le 12/01/2019 à 23:27, Chet Ramey a écrit :
On 1/12/19 1:14 AM, kevin wrote:

Moreover, the explanation in the Bash FAQ is unclear; it lacks examples to
know when "an interference" occurred.
What is "an interference"?


Look at the following answer to get an overview of the issue:
https://stackoverflow.com/a/12696899
That answer is correct: bash uses the C library's regexp library and
only guarantees that POSIX EREs work.

I do not speak English very well.
Your English is fine.

The Bash FAQ indicates that the shell works differently in a conditional
expression formed using
a regular expression. Nonetheless, the Bash FAQ does not give examples to
get a concrete idea.
I think Greg Wooledge's site has some examples along these lines.

|"In versions of bash prior to bash-3.2, the effect of quoting the regular
expression argument to the [[ command's =~ operator was not specified. *The
practical effect* was that double-quoting the pattern argument required
backslashes to quote special pattern characters, *which interfered with*
the backslash processing performed by double-quoted word expansion and was
inconsistent with how the == shell pattern matching operator treated quoted
characters."|

I do not see the practical effect because I do not find concrete cases (or
examples). In other words, I do not understand the justification.
The ambiguity is that the backslash is special to both the shell and the
regular expression matching engine. Since double-quoting the pattern
enables backslash processing as part of word expansion, what should a
string like "abc\$" match? That gets passed to the regular expression
engine as "abc$" after being processed by the shell's word expansions.
Since the unquoted $ in the pattern means to anchor the pattern at the
end of the string, it's ambiguous what the user meant. If you use a literal
pattern, you can use single quotes to make your intent clear ('abc\$'),
but if you want some expansion to be performed, you have to experiment
with the correct number of backslashes to use to get the right pattern
passed through to the regexp engine.

Beginning with bash-3.2, the behavior of =~ is documented to be the same
as ==: quoting any part of the pattern forces it to be matched as a string,
which means characters special to regular expressions have to be quoted
before they are passed to the regexp matching engine. The shell does this
by processing the quoted portions of the pattern and inserting backslashes
to quote special pattern characters.

Finally, the fact that the shell works differently in the mentioned case
should be indicated in the man page and Texinfo source.
It is. That is the difference. The effect of quoting characters in the
pattern is now specified where it was not in bash-3.1 and earlier versions.

I looked at Greg Wooledge's site <https://mywiki.wooledge.org/BashGuide/TestsAndConditionals#Conditional_Blocks_.28if.2C_test_and_.5B.5B.29>:

    Since *[[* isn't a normal command (like [ is), but a /shell
    keyword/, *it has special magical powers*. *It parses its
    arguments before they are expanded by Bash and does the expansion
    itself*, taking the result as a single argument, even if that
    result contains whitespace. (In other words, [[ does not allow
    word-splitting of its arguments.) /However/, be aware that simple
    strings still have to be quoted properly. [[ treats a space
    outside of quotes as an argument separator, just like Bash
    normally would.

Unfortunately, there is no example that shows how *[[* differs from the usual shell operation. I know that the documentation does not indicate the particular property of "[[" (features), and there has been an adjustment based on the operator "==" concerning "=~" but I still do not understand why we could not have used the normal shell rules. In your example, a user may use single quotes to escape the special meaning of the $ sign "abc'$'".

I made a mistake (typo): abc'\$'



reply via email to

[Prev in Thread] Current Thread [Next in Thread]