[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: devel: Questions about quoting in the new replacement ${var/pat/&}
From: |
Chet Ramey |
Subject: |
Re: devel: Questions about quoting in the new replacement ${var/pat/&} |
Date: |
Mon, 11 Oct 2021 12:08:04 -0400 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 |
On 10/5/21 4:41 AM, Koichi Murase wrote:
> I have questions on the new feature ${var/pat/&} in the devel branch.
>
>> commit f188aa6a013e89d421e39354086eed513652b492 (upstream/devel)
>> Author: Chet Ramey <chet.ramey@case.edu>
>> Date: Mon Oct 4 15:30:21 2021 -0400
>>
>> enable support for using `&' in the pattern substitution replacement
>> string
>>
>> Any unquoted instances of & in STRING are replaced with the matching
>> portion of PATTERN. Backslash is used to quote & in STRING; the
>> backslash is removed in order to permit a literal & in the
>> replacement string. Users should take care if STRING is
>> double-quoted to avoid unwanted interactions between the backslash
>> and double-quoting. Pattern substitution performs the check for &
>> after expanding STRING; shell programmers should quote backslashes
>> intended to escape the & and inhibit replacement so they survive any
>> quote removal performed by the expansion of STRING.
>
> I would very much like this change introduced in the latest commit
> f188aa6a in devel as it would enable many more string manipulations
> with a simple construct, but I feel the current treatment of quoting
> has problems:
>
> 1. There is no way to specify an arbitrary string in replacement in a
> way that is compatible with both bash 5.1 and 5.2.
It's a change that assigns meaning to a character that was previously
valid, not an error. It's probably going to require a shell option.
>
> 2. There is no way to insert a backslash before the matched part
> (which I'd think would be one of the typical usages of &).
This is quite reasonable, and a minor change. If the replacement function
treats backslash specially by allowing it to quote `&', it should also
allow it to escape a backslash.
>
> I below describe the details of each, followed by my suggestion or
> discussion on an alternative design.
>
> ----------------------------------------------------------------------
> 1. How to specify an arbitrary string in replacement copatibly with
> both bash 5.1 and 5.2?
>
> Currently any & in the replacement is replaced by the matched part
> regardless of whether & is quoted in the parameter-expansion context
> or not. Even the result of the parameter expansions and other
> substitutions are subject to the special treatment of &, which makes
> it non-trivial to specify an arbitrary string to the replacement
> ${var/pat/rep}.
The documentation goes into this in some detail, including specifying the
expansions that REP undergoes.
> $ str='X&Y&Z' pat='Y' rep='A&B'
> $ echo ${str/$pat/XXXX}
> X&A&B&Z
>
> where XXXX is some string that represents the literal "$rep" (i.e.,
> 'A&B'). A naive quoting of "$rep" does not work:
>
> $ echo "1:${str/$pat/"$rep"}"
> 1:X&AYB&Z
Wouldn't it be better to treat it in the standard way a double-quoted
parameter expansion would be treated? The double-quoted expansion is
already well-specified. People know how to get a backslash through
double quoting, even in a context, like this one, where quote removal
is performed.
>
> I would have expected it to work because $pat will lose special
> meaning and be treated literally when it is quoted as "$pat".
For
> example, the glob patterns *?[ etc. and anchors # and % in $pat will
> lose its special meaning when it is quoted:
>
> $ v='A' p='?'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
> B
> A
> $ v='A' p='#'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
> BA
> A
> $ v='A' p='%'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
> AB
> A
>
> Of course, if $rep is not quoted, & in $rep is replaced by the matched
> part.
>
> $ echo "2:${str/$pat/$rep}"
> 2:X&AYB&Z
>
> * To properly specify an arbitrary string in the replacement, one
> needs to replace all the characters.
>
> $ echo "${str/$pat/${rep//&/\\\\&}}"
>
> * When the replacement is not stored in a variable, one needs to
> create a variable for the replacement, i.e.,
>
> $ echo "${str/$pat/$(something)}"
>
> in Bash 5.1 needs to be converted to
>
> $ tmp=$(something)
> $ echo "${str/$pat/${tmp//&/\\\\&}}"
>
> in Bash 5.2.
>
> * Also, there is no way of writing it so that it works in both Bash
> 5.1 and 5.2. To make it work, one needs to switch the code
> depending on the bash version as:
>
> if ((BASH_VERSINFO[0]*10000+BASH_VERSINFO[1]*100>=50200)); then
> echo "${str/$pat/${rep//&/\\\\&}}"
> else
> echo "${str/$pat/$rep}"
> fi
>
> [ Note: this does not work for the devel branch because the devel
> branch still has the version 5.1. ]
>
> ----------------------------------------------------------------------
> 2. How to insert a literal backslash before the matched part?
>
> Another problem is that one cannot put a literal backslash just before
> & without affecting the meaning of &. Currently if there is any
> backslash before &, & will lose the special meaning and the two
> characters '\&' become '&' after the replacement.
I agree that just as \& allows a literal `&', \\ should be a literal
backslash.
> ----------------------------------------------------------------------
> Suggestion / Discussion
>
> I suggest that '&' has the meaning of the matched part only when it is
> not quoted in the parameter-expansion context ${...} [ Note that
> currently, '&' has the meaning of the matched part when it is not
> quoted by backslash in *the expanded result* ]. I expect the
> following interpretations with this suggestion:
The quoting outside the ${...} doesn't affect whether REP is quoted. This
is consistent with how POSIX specifies the pattern removal expansions, and
how bash has worked since bash-4.3.
So both of these, for instance, will expand to `&' *because of how bash
already works*, regardless of whether or not we attach meaning to `&' in
the replacement string.
> $ echo "${var/$pat/&}" # & represents the matched part
> $ echo "${var/$pat/\&}" # & is treated as a literal ampersand
This next one will expand to `\&' again due to existing behavior,
regardless of what we do with it, due to how quote removal works.
And so on.
> $ echo "${var/$pat/\\&}" # A literal backslash plus the matched part
> $ echo "${var/$pat/'\'&}" # A literal backslash plus the matched part
> $ rep='A&B'
> $ echo "${var/$pat/$rep}" # 'A' plus the mached part plus 'B'
> $ echo "${var/$pat/"$rep"}" # Literal 'A&B'
Rather than dance around behind the scenes trying to invisibly quote &,
but only in certain contexts where it would not otherwise be escaped by
double quoting, I would be more in favor of adding an option to enable the
feature and allowing the normal rules of double quoted strings to apply.
>
> Here are the rationale:
>
> * It is consistent with the treatment of the glob special characters
> and anchors # and % in $pat of ${var/$pat}.
Yeah, doing that was probably a mistake, but we have to live with it now.
Those are really part of the pattern operator itself, not properties of
the pattern. But nevertheless.
> * One can intuitively quote & to make it a literal ampersand. The
> distinction of the special & in ${var/$pat/&} and the literal
> ampersand in ${var/$pat/\&} is more intuitive than ${var/$pat/&} vs
> ${var/$pat/\\&}.
Not if you take into account the word expansions the replacement string
undergoes. For example, if you use ${var/$pat/\&} in bash-5.1, you're going
to get a `&' in the output, not `\&'. Now you invite the questions of why
bash expands things differently whether or not there is a `&' in the
replacement string, and since the non-special bash-5.1 expanded that to
`&', why should bash-5.2 not treat it as a replacement?
I guess the question is why not let the normal shell word expansion rules
apply, and work with the result.
> ----------------------------------------------------------------------
> Bash version of devel branch?
>
> By the way, when would the BASH_VERSINFO be updated? The devel
> version still has the Bash version 5.1. I would like to reference the
> version information to switch the implementation. In particular,
> since some incompatible changes are introduced in the devel branch
> (which are supposed to be released as Bash 5.2), I need to switch the
> implementation.
That's what I do when I need to.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/