bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Is \x24 the literal dollar character?


From: Kozics Peter (FM)
Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
Date: Tue, 15 Oct 2019 21:27:58 +0200
User-agent: Evolution 3.32.4 (3.32.4-1.fc30)

One shade even better.

Thank you all again.

PK


On Mon, 2019-10-14 at 20:06 +0000, Tom Gray wrote:
> You can also put the $ in a bracket expression.
> 
> gawk '/[$]Id: .*[$]/ {print}' <<< '$Id: rcsid$'
> 
> most metacharacters lose their special meaning inside the brackets
> and you can avoid all the escaping.
> 
> Tom
> 
> -----Original Message-----
> From: bug-gawk <bug-gawk-bounces+tom_gray=address@hidden> On
> Behalf Of address@hidden
> Sent: Saturday, October 12, 2019 7:46 PM
> To: address@hidden; address@hidden
> Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
> 
> [EXTERNAL]
> 
> Hi.
> 
> I looked into this. Please see the sidebar in that same section of
> the manual that you cited:
> 
> > @sidebar Escape Sequences for Metacharacters @cindex
> > metacharacters 
> > @subentry escape sequences for
> > 
> > Suppose you use an octal or hexadecimal escape to represent a
> > regexp 
> > metacharacter.
> > (See @ref{Regexp Operators}.)
> > Does @command{awk} treat the character as a literal character or as
> > a 
> > regexp operator?
> > 
> > @cindex dark corner @subentry escape sequences @subentry for 
> > metacharacters Historically, such characters were taken literally.
> > @value{DARKCORNER}
> > However, the POSIX standard indicates that they should be treated
> > as 
> > real metacharacters, which is what @command{gawk} does.
> > In compatibility mode (@pxref{Options}), @command{gawk} treats the 
> > characters represented by octal and hexadecimal escape sequences 
> > literally when used in regexp constants. Thus, @code{/a\52b/} is 
> > equivalent to @code{/a\*b/}.
> > @end sidebar
> 
> In short, with --traditional, you'll get the behavior you're looking
> for. Otherwise, gawk is following POSIX and treating such characters
> as real metacharacters.
> 
> To solve your problem, you can do something like:
> 
>         gawk '$0 ~ ("\\$" "Id: .*" "\\$") {print}' <<< '$Id: rcsid$'
> 
> HTH,
> 
> Arnold
> 
> "Kozics Peter (FM)" <address@hidden> wrote:
> 
> > Dear,
> > 
> > 
> > (1)
> > this matches:
> > $ gawk '/\$Id: .*\$/ {print}' <<< '$Id: rcsid$'
> > $Id: rcsid$
> > 
> > (2)
> > I expected that this would match as well, but it didn't:
> > $ gawk '/\x24Id: .*\x24/ {print}' <<< '$Id: rcsid$'
> > 
> > The expectation was based on gawk manual section 3.2: \x24 should
> > be 
> > the literal dollar character, not the dollar metacharacter.
> > 
> > (3)
> > Now, let's go on, this does not match either:
> > $ gawk '/\\x24Id: .*\\x24/ {print}' <<< '$Id: rcsid$'
> > 
> > (4)
> > And this one still not:
> > $ gawk '/\\\x24Id: .*\\\x24/ {print}' <<< '$Id: rcsid$'
> > 
> > (5)
> > At long last, this matches again:
> > $ gawk '/\x5c\x24Id: .*\x5c\x24/ {print}' <<< '$Id: rcsid$'
> > $Id: rcsid$
> > 
> > which looks to me awkward and quite counterintuitive.
> > 
> > -------------
> > The problem with (1) is that when the regexp is in a file under
> > RCS 
> > control, RCS will destroy the regexp upon checkout by performing a 
> > keyword substitution. So the straightforward and seemingly manual- 
> > compliant solution would be (2), which is unfortunately not.
> > 
> > I wonder if I found a gawk bug or a flaw in the regexp / literal / 
> > meta concept or a vague place in the gawk manual. Or just 
> > misunderstood something?
> > 
> > -------------
> > OS:
> > $ uname -a
> > Linux gygv 5.2.18-200.fc30.x86_64 #1 SMP Tue Oct 1 13:14:07 UTC
> > 2019
> > x86_64 x86_64 x86_64 GNU/Linux
> > 
> > gawk:
> > $ gawk --version
> > GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
> > Copyright 
> > (C) 1989, 1991-2018 Free Software Foundation.
> > 
> > gawk manual:
> > This is Edition 4.2 of GAWK: Effective AWK Programming
> > 
> > 
> > yours
> > KP
> > 
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]