bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Is \x24 the literal dollar character?


From: Wolfgang Laun
Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
Date: Tue, 15 Oct 2019 22:24:26 +0200

Another old trick to hide substitutions from some pre-processor is to
"interrupt" the quoted string:

gawk '/$''Id: .*$/ {print}' <<< '$Id: rcsid$'



On Tue, 15 Oct 2019 at 21:28, Kozics Peter (FM) <address@hidden>
wrote:

> One shade even better.
>
> Thank you all again.
>
> PK
>
>
> On Mon, 2019-10-14 at 20:06 +0000, Tom Gray wrote:
> > You can also put the $ in a bracket expression.
> >
> > gawk '/[$]Id: .*[$]/ {print}' <<< '$Id: rcsid$'
> >
> > most metacharacters lose their special meaning inside the brackets
> > and you can avoid all the escaping.
> >
> > Tom
> >
> > -----Original Message-----
> > From: bug-gawk <bug-gawk-bounces+tom_gray=address@hidden> On
> > Behalf Of address@hidden
> > Sent: Saturday, October 12, 2019 7:46 PM
> > To: address@hidden; address@hidden
> > Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
> >
> > [EXTERNAL]
> >
> > Hi.
> >
> > I looked into this. Please see the sidebar in that same section of
> > the manual that you cited:
> >
> > > @sidebar Escape Sequences for Metacharacters @cindex
> > > metacharacters
> > > @subentry escape sequences for
> > >
> > > Suppose you use an octal or hexadecimal escape to represent a
> > > regexp
> > > metacharacter.
> > > (See @ref{Regexp Operators}.)
> > > Does @command{awk} treat the character as a literal character or as
> > > a
> > > regexp operator?
> > >
> > > @cindex dark corner @subentry escape sequences @subentry for
> > > metacharacters Historically, such characters were taken literally.
> > > @value{DARKCORNER}
> > > However, the POSIX standard indicates that they should be treated
> > > as
> > > real metacharacters, which is what @command{gawk} does.
> > > In compatibility mode (@pxref{Options}), @command{gawk} treats the
> > > characters represented by octal and hexadecimal escape sequences
> > > literally when used in regexp constants. Thus, @code{/a\52b/} is
> > > equivalent to @code{/a\*b/}.
> > > @end sidebar
> >
> > In short, with --traditional, you'll get the behavior you're looking
> > for. Otherwise, gawk is following POSIX and treating such characters
> > as real metacharacters.
> >
> > To solve your problem, you can do something like:
> >
> >         gawk '$0 ~ ("\\$" "Id: .*" "\\$") {print}' <<< '$Id: rcsid$'
> >
> > HTH,
> >
> > Arnold
> >
> > "Kozics Peter (FM)" <address@hidden> wrote:
> >
> > > Dear,
> > >
> > >
> > > (1)
> > > this matches:
> > > $ gawk '/\$Id: .*\$/ {print}' <<< '$Id: rcsid$'
> > > $Id: rcsid$
> > >
> > > (2)
> > > I expected that this would match as well, but it didn't:
> > > $ gawk '/\x24Id: .*\x24/ {print}' <<< '$Id: rcsid$'
> > >
> > > The expectation was based on gawk manual section 3.2: \x24 should
> > > be
> > > the literal dollar character, not the dollar metacharacter.
> > >
> > > (3)
> > > Now, let's go on, this does not match either:
> > > $ gawk '/\\x24Id: .*\\x24/ {print}' <<< '$Id: rcsid$'
> > >
> > > (4)
> > > And this one still not:
> > > $ gawk '/\\\x24Id: .*\\\x24/ {print}' <<< '$Id: rcsid$'
> > >
> > > (5)
> > > At long last, this matches again:
> > > $ gawk '/\x5c\x24Id: .*\x5c\x24/ {print}' <<< '$Id: rcsid$'
> > > $Id: rcsid$
> > >
> > > which looks to me awkward and quite counterintuitive.
> > >
> > > -------------
> > > The problem with (1) is that when the regexp is in a file under
> > > RCS
> > > control, RCS will destroy the regexp upon checkout by performing a
> > > keyword substitution. So the straightforward and seemingly manual-
> > > compliant solution would be (2), which is unfortunately not.
> > >
> > > I wonder if I found a gawk bug or a flaw in the regexp / literal /
> > > meta concept or a vague place in the gawk manual. Or just
> > > misunderstood something?
> > >
> > > -------------
> > > OS:
> > > $ uname -a
> > > Linux gygv 5.2.18-200.fc30.x86_64 #1 SMP Tue Oct 1 13:14:07 UTC
> > > 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >
> > > gawk:
> > > $ gawk --version
> > > GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
> > > Copyright
> > > (C) 1989, 1991-2018 Free Software Foundation.
> > >
> > > gawk manual:
> > > This is Edition 4.2 of GAWK: Effective AWK Programming
> > >
> > >
> > > yours
> > > KP
> > >
> >
> >
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]