bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: documentation around RE repetition metachars may need clarification


From: Ed Morton
Subject: Re: documentation around RE repetition metachars may need clarification
Date: Tue, 23 May 2023 06:06:30 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

Sounds good, thanks.

    Ed.

On 5/23/2023 3:14 AM, arnold@skeeve.com wrote:
Hi.

I will eventually push the patch below.

Thanks

Arnold

arnold@skeeve.com  wrote:

Hi.

Thanks for the report. I will look at revising the text
in the manual.

Arnold

Ed Morton<mortoneccc@comcast.net>  wrote:

In the gawk manual under
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operator-Details.html we have this statement:

In POSIX |awk| and |gawk|, the ‘*’, ‘+’, and ‘?’ operators stand for
themselves when there is nothing in the regexp that precedes them.
while in the POSIX spec under
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_03 we have this statement:

*+?{
     The <asterisk>, <plus-sign>, <question-mark>, and <left-brace>
     shall be special except when used in a bracket expression (see RE
     Bracket Expression
     
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05>).
     Any of the following uses produce undefined results:

      *

         If these characters appear first in an ERE

So the gawk manual statement says that /+foo/ in any POSIX awk will
match the literal string "+foo" while the POSIX spec statement says it's
undefined behavior.

Should the gawk manual be tweaked to clarify/explain what it currently
says about POSIX awk since it apparently contradicts the POSIX spec?

      Ed.
---------------------------------------
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index b55e8c8..cde3c22 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -5886,8 +5896,11 @@ As in arithmetic, parentheses can change how operators 
are grouped.
@cindex POSIX @command{awk} @subentry regular expressions and
  @cindex @command{gawk} @subentry regular expressions @subentry precedence
-In POSIX @command{awk} and @command{gawk}, the @samp{*}, @samp{+}, and
-@samp{?} operators stand for themselves when there is nothing in the
+According to the POSIX specification, when @samp{*}, @samp{+}, @samp{?},
+or @samp{@{} are not preceded by a character, the behavior is
+``undefined.''
+In practice, for @command{gawk}, the @samp{*}, @samp{+}, @samp{?} and
+@samp{@{} operators stand for themselves when there is nothing in the
  regexp that precedes them.  For example, @code{/+/} matches a literal
  plus sign.  However, many other versions of @command{awk} treat such a
  usage as a syntax error.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]