bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: documentation around RE repetition metachars may need clarification


From: arnold
Subject: Re: documentation around RE repetition metachars may need clarification
Date: Tue, 23 May 2023 02:14:25 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

I will eventually push the patch below.

Thanks

Arnold

arnold@skeeve.com wrote:

> Hi.
>
> Thanks for the report. I will look at revising the text
> in the manual.
>
> Arnold
>
> Ed Morton <mortoneccc@comcast.net> wrote:
>
> > In the gawk manual under 
> > https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operator-Details.html
> >  
> > we have this statement:
> >
> > > In POSIX |awk| and |gawk|, the ‘*’, ‘+’, and ‘?’ operators stand for 
> > > themselves when there is nothing in the regexp that precedes them.
> >
> > while in the POSIX spec under 
> > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_03
> >  
> > we have this statement:
> >
> > > *+?{
> > >     The <asterisk>, <plus-sign>, <question-mark>, and <left-brace>
> > >     shall be special except when used in a bracket expression (see RE
> > >     Bracket Expression
> > >     
> > > <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05>).
> > >     Any of the following uses produce undefined results:
> > >
> > >      *
> > >
> > >         If these characters appear first in an ERE
> > >
> >
> > So the gawk manual statement says that /+foo/ in any POSIX awk will 
> > match the literal string "+foo" while the POSIX spec statement says it's 
> > undefined behavior.
> >
> > Should the gawk manual be tweaked to clarify/explain what it currently 
> > says about POSIX awk since it apparently contradicts the POSIX spec?
> >
> >      Ed.
---------------------------------------
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index b55e8c8..cde3c22 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -5886,8 +5896,11 @@ As in arithmetic, parentheses can change how operators 
are grouped.
 
 @cindex POSIX @command{awk} @subentry regular expressions and
 @cindex @command{gawk} @subentry regular expressions @subentry precedence
-In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
-@samp{?} operators stand for themselves when there is nothing in the
+According to the POSIX specification, when @samp{*}, @samp{+}, @samp{?},
+or @samp{@{} are not preceded by a character, the behavior is
+``undefined.''
+In practice, for @command{gawk}, the @samp{*}, @samp{+}, @samp{?} and
+@samp{@{} operators stand for themselves when there is nothing in the
 regexp that precedes them.  For example, @code{/+/} matches a literal
 plus sign.  However, many other versions of @command{awk} treat such a
 usage as a syntax error.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]