[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensu
From: |
Vincent Férotin |
Subject: |
Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ? |
Date: |
Fri, 3 Apr 2020 17:37:44 +0200 |
Hi gawk maintainers!
New to awk/gawk/mawk, I'd like to describe here what could possibly be a bug,
at least a limitation, I encountered in these tools for my basic usage.
Perhaps what follows is not a bug but a miscomprehension of me-as-newbee?
Anyway, thanks in advance for reading this...
V.F.
TL;DR
=====
Using [gm]awk as a templating/macro engine, following shell commands
do not output what could be expected:
$ echo "repository=%MY_URL% # %COMMENT%" |COMMENT="Set repository
URL" MY_URL="http://www.example.com" awk '{print gensub(/%([_A-Z]+)%/,
ENVIRON["\\1"], "g")}'
repository= #
or roughly equivalent:
$ echo "repository=MY_URL # COMMENT" |COMMENT="Set repository
URL" MY_URL="http://www.example.com" awk '{gsub(/[_A-Z]+/,
ENVIRON["&"]); print $0}'
repository= #
It seems that "\\1" of gensub() (or "&" for gsub()) is not well escaped
with content providing from what regexp. captured, at least in the context
of indexing ENVIRON. Expected output should be, IMHO and as far as I understand:
repository=http://www.example.com # Set repository URL
Versions tested
===============
* gawk:
- 4.1.4 (Ubuntu 18.04 Bionic)
- 4.2.1 (Ubuntu 19.10 Eoan)
- 5.0.1 (Ubuntu 20.04 Focal)
* mawk:
- 3.3 (Ubuntu 18.04 Bionic & 19.10 Eoan)
- 3.4.20200120 (Ubuntu 20.04 Focal)
Usage
=====
In order to provision some virtual machine with Bash scripts,
I used 'sed' for replacing some paths (string) or
configuration file contents, but fail for some usages, where replaced string
contains some chars. 'sed' could interpret as metachars (such as "/").
I then tried using 'm4', where effective values to replace placeholders are
available as environment variables.
But Debian/Ubuntu packaging seems to have some limitations, notably by disabling
'-W, --word-regexp=REGEXP' option (expected to allow setting
placeholder regexp.,
for e.g. "%([_A-Z]+)%").
Using m4 as is, with its available configuration as chosen by
packaging maintainers,
is feasible:
$ echo "changecom\nrepository=MY_URL # COMMENT" | m4
-DMY_URL="$MY_URL" -DCOMMENT="$COMMENT"
repository=http://www.example.com # Set repository URL
but I miss choosing a more robust placeholder delimiters
(I started here by pre- and suffixing them by "%",
but I also could have chosen an other format, such as the more common "${var}").
It seems that this need still exists outside my sole and naïve usage,
see for example:
-
https://stackoverflow.com/questions/415677/how-to-replace-placeholders-in-a-text-file
-
https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash
Note that, outside an alone answer (over a total of 40 (16+24 at time
of this writing)):
-
https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
no valid answer use awk or one of its derivates!
(NB: This specific answer could probably suffice for my needs...)
Evidences that `gensub(..., ENVIRON["\\1"])` should work
========================================================
Using "\\1" in gensub() is well escaped:
$ echo "repository=%MY_URL% # %COMMENT%" | awk '{print
gensub(/%([_A-Z]+)%/, "( \\1 )", "g")}'
repository=( MY_URL ) # ( COMMENT )
Passing directly desired var. name to ENVIRON also works:
$ echo "repository=%MY_URL%" |MY_URL="http://www.example.com" awk
'{print gensub(/%MY_URL%/, ENVIRON["MY_URL"], "g")}'
repository=http://www.example.com
`ENVIRON` seems to not accept other expressions as index
========================================================
Note also that trying to re-write awk script provided by above
StackOverflow answer
described in
https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
that is:
'match($0, "[$]{.*}") {var = substr($0, (RSTART + 2), (RLENGTH -
3)); gsub("[$]{"var"}", ENVIRON[var])}1'
into more condensed and adapted to my use case:
'{gensub(/%([_A-Z]+)%/, ENVIRON[substr("\\1", 1, (length("\\1") -
2))])}' # gawk
'{gsub(/%[_A-Z]+%/, ENVIRON[substr("&", 1, (length("&") - 1))]);
print $0}' # mawk
does not work either.
Search for previous existing occurrences of `gensub(..., ENVIRON["\\1"])`
========================================================================
No occurrence of ``ENVIRON[`` with other type of index than plain
string or variable
were found in:
* `sed and awk Pocket Reference` by Arnold Robbins (O'Reilly, 2002, 2nd ed.)
http://shop.oreilly.com/product/9780596003524.do
* `sed & awk` by Dale Dougherty & Arnold Robbins (O'Reilly, 1997, 2nd ed.)
http://shop.oreilly.com/product/9781565922259.do
* `Effective awk Programming` by Arnold Robbins (O'Reilly, 2015, 4th ed.)
http://shop.oreilly.com/product/0636920033820.do
* `GNU awk - awesome one-liners` by Sundeep Agarwal (version 0.7)
https://learnbyexample.github.io/books/
(pointed recently in HackerNews:
https://news.ycombinator.com/item?id=22758217 )
* `bug-gawk` archives
https://lists.gnu.org/archive/html/bug-gawk/
- Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?,
Vincent Férotin <=
- Message not available
- Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?, Wolfgang Laun, 2020/04/04
- Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?, Vincent Férotin, 2020/04/06
- Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?, Wolfgang Laun, 2020/04/06
- Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?, arnold, 2020/04/06
- Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?, Vincent Férotin, 2020/04/06