bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in g


From: Vincent Férotin
Subject: Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?
Date: Mon, 6 Apr 2020 14:22:05 +0200

Hey Wolfgang, thank you very much for the detailed answer!

You perfectly understand my needs and I greatly appreciated your
solutions proposals. :-)

Beyond my little and rather anecdotal needs, and I understand that awk
in its current state
does not works as I previously expected, one minor intend of my previous message
to bug-gawk mailing-list was to ask you, developers, if eventually
such a feature is (or is not) desirable
for a future version of awk? That is, should a-future-awk could do
string interpolation
and in-place evaluation, and interprets all "\\1" occurrences in a
g(en)sub context
than sole replacement string?

Anyway, thanks again!

V.F.



Le sam. 4 avr. 2020 à 06:41, Wolfgang Laun <address@hidden> a écrit :
>
>
> If I understand everything correctly, you are trying to replace some
>    %abc%
> in an input line by the value of the environment variable abc.
>
> This cannot be done using a single gsub, because the backref \\1 only works 
> within a string literal that is to be the complete replacement text. What you 
> need is an additional evaluation, of the expression "ENVIRON["  "\\1"  "]", 
> to be inserted in place of the %-% placeholder. If awk had eval, you could 
> write:
>
>      print gensub(/%([_A-Z]+)%/, eval("ENVIRON[\"\\1\"]"), "g")   # this is 
> not awk
>
> You might use Perl, where substitution (s///) has a flag 'e', requesting the 
> replacement to be evaluated as an expression to become the text to be 
> inserted, i.e., an implied eval.
>
> echo 'repository=%MY_URL%  # %COMMENT%'  | \
>    COMMENT="Set repository URL" MY_URL="http://www.example.com"; \
>    perl -e 'while( <> ){s/%([_A-Z]+)%/$ENV{$1}/ge; print;}'
>
> An awk version requires a user-defined function:
> function envsub(text,  chunk){
>     while( match(text, /([^%]*)%([_A-Z]+)%(.*)/, chunk) != 0 ){
>        sub( "%"chunk[2]"%", ENVIRON[chunk[2]], text )
>     }
>     return text;
> }
> { print envsub($0) }
>
> Wolfgang
>
>
> On Fri, 3 Apr 2020 at 18:35, Vincent Férotin <address@hidden> wrote:
>>
>> Hi gawk maintainers!
>>
>> New to awk/gawk/mawk, I'd like to describe here what could possibly be a bug,
>> at least a limitation, I encountered in these tools for my basic usage.
>> Perhaps what follows is not a bug but a miscomprehension of me-as-newbee?
>> Anyway, thanks in advance for reading this...
>>
>> V.F.
>>
>>
>> TL;DR
>> =====
>>
>> Using [gm]awk as a templating/macro engine, following shell commands
>> do not output what could be expected:
>>
>>     $ echo "repository=%MY_URL%  # %COMMENT%" |COMMENT="Set repository
>> URL" MY_URL="http://www.example.com"; awk '{print gensub(/%([_A-Z]+)%/,
>> ENVIRON["\\1"], "g")}'
>>     repository=  #
>>
>> or roughly equivalent:
>>
>>     $ echo "repository=MY_URL  # COMMENT" |COMMENT="Set repository
>> URL" MY_URL="http://www.example.com"; awk '{gsub(/[_A-Z]+/,
>> ENVIRON["&"]); print $0}'
>>     repository=  #
>>
>> It seems that "\\1" of gensub() (or "&" for gsub()) is not well escaped
>> with content providing from what regexp. captured, at least in the context
>> of indexing ENVIRON. Expected output should be, IMHO and as far as I 
>> understand:
>>
>>     repository=http://www.example.com  # Set repository URL
>>
>>
>> Versions tested
>> ===============
>>
>> * gawk:
>>   - 4.1.4 (Ubuntu 18.04 Bionic)
>>   - 4.2.1 (Ubuntu 19.10 Eoan)
>>   - 5.0.1 (Ubuntu 20.04 Focal)
>> * mawk:
>>   - 3.3 (Ubuntu 18.04 Bionic & 19.10 Eoan)
>>   - 3.4.20200120 (Ubuntu 20.04 Focal)
>>
>>
>> Usage
>> =====
>>
>> In order to provision some virtual machine with Bash scripts,
>> I used 'sed' for replacing some paths (string) or
>> configuration file contents, but fail for some usages, where replaced string
>> contains some chars. 'sed' could interpret as metachars (such as "/").
>>
>> I then tried using 'm4', where effective values to replace placeholders are
>> available as environment variables.
>> But Debian/Ubuntu packaging seems to have some limitations, notably by 
>> disabling
>> '-W, --word-regexp=REGEXP' option (expected to allow setting
>> placeholder regexp.,
>> for e.g. "%([_A-Z]+)%").
>> Using m4 as is, with its available configuration as chosen by
>> packaging maintainers,
>> is feasible:
>>
>>     $ echo "changecom\nrepository=MY_URL  # COMMENT" | m4
>> -DMY_URL="$MY_URL" -DCOMMENT="$COMMENT"
>>
>>     repository=http://www.example.com  # Set repository URL
>>
>> but I miss choosing a more robust placeholder delimiters
>> (I started here by pre- and suffixing them by "%",
>> but I also could have chosen an other format, such as the more common 
>> "${var}").
>>
>> It seems that this need still exists outside my sole and naïve usage,
>> see for example:
>> - 
>> https://stackoverflow.com/questions/415677/how-to-replace-placeholders-in-a-text-file
>> - 
>> https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash
>>
>> Note that, outside an alone answer (over a total of 40 (16+24 at time
>> of this writing)):
>> - 
>> https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
>> no valid answer use awk or one of its derivates!
>> (NB: This specific answer could probably suffice for my needs...)
>>
>>
>> Evidences that `gensub(..., ENVIRON["\\1"])` should work
>> ========================================================
>>
>> Using "\\1" in gensub() is well escaped:
>>
>>     $ echo "repository=%MY_URL%  # %COMMENT%" | awk '{print
>> gensub(/%([_A-Z]+)%/, "( \\1 )", "g")}'
>>     repository=( MY_URL )  # ( COMMENT )
>>
>> Passing directly desired var. name to ENVIRON also works:
>>
>>     $ echo "repository=%MY_URL%" |MY_URL="http://www.example.com"; awk
>> '{print gensub(/%MY_URL%/, ENVIRON["MY_URL"], "g")}'
>>     repository=http://www.example.com
>>
>>
>> `ENVIRON` seems to not accept other expressions as index
>> ========================================================
>>
>> Note also that trying to re-write awk script provided by above
>> StackOverflow answer
>> described in 
>> https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
>> that is:
>>
>>     'match($0, "[$]{.*}") {var = substr($0, (RSTART + 2), (RLENGTH -
>> 3)); gsub("[$]{"var"}", ENVIRON[var])}1'
>>
>> into more condensed and adapted to my use case:
>>
>>     '{gensub(/%([_A-Z]+)%/, ENVIRON[substr("\\1", 1, (length("\\1") -
>> 2))])}'  # gawk
>>     '{gsub(/%[_A-Z]+%/, ENVIRON[substr("&", 1, (length("&") - 1))]);
>> print $0}'  # mawk
>>
>> does not work either.
>>
>>
>> Search for previous existing occurrences of `gensub(..., ENVIRON["\\1"])`
>> ========================================================================
>>
>> No occurrence of ``ENVIRON[`` with other type of index than plain
>> string or variable
>> were found in:
>>
>> * `sed and awk Pocket Reference` by Arnold Robbins (O'Reilly, 2002, 2nd ed.)
>>     http://shop.oreilly.com/product/9780596003524.do
>> * `sed & awk` by Dale Dougherty & Arnold Robbins (O'Reilly, 1997, 2nd ed.)
>>     http://shop.oreilly.com/product/9781565922259.do
>> * `Effective awk Programming` by Arnold Robbins (O'Reilly, 2015, 4th ed.)
>>     http://shop.oreilly.com/product/0636920033820.do
>> * `GNU awk - awesome one-liners` by Sundeep Agarwal (version 0.7)
>>     https://learnbyexample.github.io/books/
>>     (pointed recently in HackerNews:
>> https://news.ycombinator.com/item?id=22758217 )
>> * `bug-gawk` archives
>>       https://lists.gnu.org/archive/html/bug-gawk/
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]