[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: documentation around RE repetition metachars may need clarification
From: |
arnold |
Subject: |
Re: documentation around RE repetition metachars may need clarification |
Date: |
Tue, 23 May 2023 02:14:25 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
I will eventually push the patch below.
Thanks
Arnold
arnold@skeeve.com wrote:
> Hi.
>
> Thanks for the report. I will look at revising the text
> in the manual.
>
> Arnold
>
> Ed Morton <mortoneccc@comcast.net> wrote:
>
> > In the gawk manual under
> > https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operator-Details.html
> >
> > we have this statement:
> >
> > > In POSIX |awk| and |gawk|, the ‘*’, ‘+’, and ‘?’ operators stand for
> > > themselves when there is nothing in the regexp that precedes them.
> >
> > while in the POSIX spec under
> > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_03
> >
> > we have this statement:
> >
> > > *+?{
> > > The <asterisk>, <plus-sign>, <question-mark>, and <left-brace>
> > > shall be special except when used in a bracket expression (see RE
> > > Bracket Expression
> > >
> > > <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05>).
> > > Any of the following uses produce undefined results:
> > >
> > > *
> > >
> > > If these characters appear first in an ERE
> > >
> >
> > So the gawk manual statement says that /+foo/ in any POSIX awk will
> > match the literal string "+foo" while the POSIX spec statement says it's
> > undefined behavior.
> >
> > Should the gawk manual be tweaked to clarify/explain what it currently
> > says about POSIX awk since it apparently contradicts the POSIX spec?
> >
> > Ed.
---------------------------------------
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index b55e8c8..cde3c22 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -5886,8 +5896,11 @@ As in arithmetic, parentheses can change how operators
are grouped.
@cindex POSIX @command{awk} @subentry regular expressions and
@cindex @command{gawk} @subentry regular expressions @subentry precedence
-In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
-@samp{?} operators stand for themselves when there is nothing in the
+According to the POSIX specification, when @samp{*}, @samp{+}, @samp{?},
+or @samp{@{} are not preceded by a character, the behavior is
+``undefined.''
+In practice, for @command{gawk}, the @samp{*}, @samp{+}, @samp{?} and
+@samp{@{} operators stand for themselves when there is nothing in the
regexp that precedes them. For example, @code{/+/} matches a literal
plus sign. However, many other versions of @command{awk} treat such a
usage as a syntax error.