[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Is \x24 the literal dollar character?
From: |
Kozics Peter (FM) |
Subject: |
Re: [bug-gawk] Is \x24 the literal dollar character? |
Date: |
Tue, 15 Oct 2019 21:27:58 +0200 |
User-agent: |
Evolution 3.32.4 (3.32.4-1.fc30) |
One shade even better.
Thank you all again.
PK
On Mon, 2019-10-14 at 20:06 +0000, Tom Gray wrote:
> You can also put the $ in a bracket expression.
>
> gawk '/[$]Id: .*[$]/ {print}' <<< '$Id: rcsid$'
>
> most metacharacters lose their special meaning inside the brackets
> and you can avoid all the escaping.
>
> Tom
>
> -----Original Message-----
> From: bug-gawk <bug-gawk-bounces+tom_gray=address@hidden> On
> Behalf Of address@hidden
> Sent: Saturday, October 12, 2019 7:46 PM
> To: address@hidden; address@hidden
> Subject: Re: [bug-gawk] Is \x24 the literal dollar character?
>
> [EXTERNAL]
>
> Hi.
>
> I looked into this. Please see the sidebar in that same section of
> the manual that you cited:
>
> > @sidebar Escape Sequences for Metacharacters @cindex
> > metacharacters
> > @subentry escape sequences for
> >
> > Suppose you use an octal or hexadecimal escape to represent a
> > regexp
> > metacharacter.
> > (See @ref{Regexp Operators}.)
> > Does @command{awk} treat the character as a literal character or as
> > a
> > regexp operator?
> >
> > @cindex dark corner @subentry escape sequences @subentry for
> > metacharacters Historically, such characters were taken literally.
> > @value{DARKCORNER}
> > However, the POSIX standard indicates that they should be treated
> > as
> > real metacharacters, which is what @command{gawk} does.
> > In compatibility mode (@pxref{Options}), @command{gawk} treats the
> > characters represented by octal and hexadecimal escape sequences
> > literally when used in regexp constants. Thus, @code{/a\52b/} is
> > equivalent to @code{/a\*b/}.
> > @end sidebar
>
> In short, with --traditional, you'll get the behavior you're looking
> for. Otherwise, gawk is following POSIX and treating such characters
> as real metacharacters.
>
> To solve your problem, you can do something like:
>
> gawk '$0 ~ ("\\$" "Id: .*" "\\$") {print}' <<< '$Id: rcsid$'
>
> HTH,
>
> Arnold
>
> "Kozics Peter (FM)" <address@hidden> wrote:
>
> > Dear,
> >
> >
> > (1)
> > this matches:
> > $ gawk '/\$Id: .*\$/ {print}' <<< '$Id: rcsid$'
> > $Id: rcsid$
> >
> > (2)
> > I expected that this would match as well, but it didn't:
> > $ gawk '/\x24Id: .*\x24/ {print}' <<< '$Id: rcsid$'
> >
> > The expectation was based on gawk manual section 3.2: \x24 should
> > be
> > the literal dollar character, not the dollar metacharacter.
> >
> > (3)
> > Now, let's go on, this does not match either:
> > $ gawk '/\\x24Id: .*\\x24/ {print}' <<< '$Id: rcsid$'
> >
> > (4)
> > And this one still not:
> > $ gawk '/\\\x24Id: .*\\\x24/ {print}' <<< '$Id: rcsid$'
> >
> > (5)
> > At long last, this matches again:
> > $ gawk '/\x5c\x24Id: .*\x5c\x24/ {print}' <<< '$Id: rcsid$'
> > $Id: rcsid$
> >
> > which looks to me awkward and quite counterintuitive.
> >
> > -------------
> > The problem with (1) is that when the regexp is in a file under
> > RCS
> > control, RCS will destroy the regexp upon checkout by performing a
> > keyword substitution. So the straightforward and seemingly manual-
> > compliant solution would be (2), which is unfortunately not.
> >
> > I wonder if I found a gawk bug or a flaw in the regexp / literal /
> > meta concept or a vague place in the gawk manual. Or just
> > misunderstood something?
> >
> > -------------
> > OS:
> > $ uname -a
> > Linux gygv 5.2.18-200.fc30.x86_64 #1 SMP Tue Oct 1 13:14:07 UTC
> > 2019
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> > gawk:
> > $ gawk --version
> > GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
> > Copyright
> > (C) 1989, 1991-2018 Free Software Foundation.
> >
> > gawk manual:
> > This is Edition 4.2 of GAWK: Effective AWK Programming
> >
> >
> > yours
> > KP
> >
>
>