bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] gcc4.0.0 regression


From: Aharon Robbins
Subject: Re: [bug-gawk] gcc4.0.0 regression
Date: Thu, 28 Jul 2011 22:12:23 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Greetings. Re this:

> Date: Thu, 21 Jul 2011 08:50:49 -0400
> From: John Ellson <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] gcc4.0.0 regression
>
> The '\' escape behaviour of gsub() has changed in gcc-4.0.0.  This is 
> breaking a ~20year old script in graphviz that converts postscript text 
> to C strings.
>
> $ rpm -q gawk
> gawk-3.1.8-3.fc15.x86_64
> $ echo '\\' | gawk '{gsub("\\\\","\\\\",$0); print($0);}'
> \\\\
>
> $ rpm -q gawk
> gawk-4.0.0-1.fc16.x86_64
> $ echo '\\' | gawk '{gsub("\\\\","\\\\",$0); print($0);}'
> \\
>
> This problem has also been reported to:
>       https://bugzilla.redhat.com/show_bug.cgi?id=723878
>
> John

Thanks for the report. Per our private email discussion, a portable
solution is

        echo '\\' | gawk '{gsub("\\\\","&&",$0); print($0);}'

which works in current and previous gawk.

Nonetheless, as I don't want this to be a FAQ from now until the
End Of Time, I have reverted gawk's behavior to what it was. A diff
is attached, and I will shortly push it to the gawk-4.0-stable
branch in the git repo.

Thanks,

Arnold
-------------------------------------------------------
diff --git a/ChangeLog b/ChangeLog
index 5b06317..f06fd57 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2011-07-28         Arnold D. Robbins     <address@hidden>
+
+       * builtin.c (do_sub): Revert to gawk 3.1 behavior for backslash
+       handling. It was stupid to think I could break compatibility.
+
 2011-07-26         John Haque      <address@hidden>
 
        * eval.c (r_interpret): In cases Op_var_assign and Op_field_assign,
diff --git a/builtin.c b/builtin.c
index 8685d29..4d87592 100644
--- a/builtin.c
+++ b/builtin.c
@@ -2546,13 +2546,30 @@ set_how_many:
                                        repllen--;
                                        scan++;
                                }
-                       } else {
+                       } else if (do_posix) {
                                /* \& --> &, \\ --> \ */
                                if (scan[1] == '&' || scan[1] == '\\') {
                                        repllen--;
                                        scan++;
                                } /* else
                                        leave alone, it goes into the output */
+                       } else {
+                               /* gawk default behavior since 1996 */
+                               if (strncmp(scan, "\\\\\\&", 4) == 0) {
+                                       /* \\\& --> \& */
+                                       repllen -= 2;
+                                       scan += 3;
+                               } else if (strncmp(scan, "\\\\&", 3) == 0) {
+                                       /* \\& --> \<string> */
+                                       ampersands++;
+                                       repllen--;
+                                       scan += 2;
+                               } else if (scan[1] == '&') {
+                                       /* \& --> & */
+                                       repllen--;
+                                       scan++;
+                               } /* else
+                                       leave alone, it goes into the output */
                        }
                }
        }
@@ -2630,11 +2647,30 @@ set_how_many:
                                                        scan++;
                                                } else  /* \q for any q --> q */
                                                        *bp++ = *++scan;
-                                       } else {
+                                       } else if (do_posix) {
                                                /* \& --> &, \\ --> \ */
                                                if (scan[1] == '&' || scan[1] 
== '\\')
                                                        scan++;
                                                *bp++ = *scan;
+                                       } else {
+                                               /* gawk default behavior since 
1996 */
+                                               if (strncmp(scan, "\\\\\\&", 4) 
== 0) {
+                                                       /* \\\& --> \& */
+                                                       *bp++ = '\\';
+                                                       *bp++ = '&';
+                                                       scan += 3;
+                                               } else if (strncmp(scan, 
"\\\\&", 3) == 0) {
+                                                       /* \\& --> \<string> */
+                                                       *bp++ = '\\';
+                                                       for (cp = matchstart; 
cp < matchend; cp++)
+                                                               *bp++ = *cp;
+                                                       scan += 2;
+                                               } else if (scan[1] == '&') {
+                                                       /* \& --> & */
+                                                       *bp++ = '&';
+                                                       scan++;
+                                               } else
+                                                       *bp++ = *scan;
                                        }
                                } else
                                        *bp++ = *scan;
diff --git a/doc/gawk.texi b/doc/gawk.texi
index b56cbbc..2891c90 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -15023,8 +15023,6 @@ case of even numbers of backslashes entered at the 
lexical level.)
 The problem with the historical approach is that there is no way to get
 a literal @samp{\} followed by the matched text.
 
address@hidden We can omit this historical stuff now
address@hidden
 @c @cindex @command{awk} language, POSIX version
 @cindex POSIX @command{awk}, functions and, @code{gsub()}/@code{sub()}
 The 1992 POSIX standard attempted to fix this problem. That standard
@@ -15158,7 +15156,6 @@ in the output literally.
 The POSIX standard took much longer to be revised than was expected in 1996.
 The 2001 standard does not follow the above rules.  Instead, the rules
 there are somewhat simpler.  The results are similar except for one case.
address@hidden ignore
 
 The POSIX rules state that @samp{\&} in the replacement string produces
 a literal @samp{&}, @samp{\\} produces a literal @samp{\}, and @samp{\} 
followed
@@ -15209,17 +15206,21 @@ These rules are presented in @ref{table-posix-sub}.
 @end ifnottex
 @end float
 
address@hidden
 The only case where the difference is noticeable is the last one: @samp{\\\\}
 is seen as @samp{\\} and produces @samp{\} instead of @samp{\\}.
 
 Starting with @value{PVERSION} 3.1.4, @command{gawk} followed the POSIX rules
 when @option{--posix} is specified (@pxref{Options}). Otherwise,
 it continued to follow the 1996 proposed rules, since
-that had been its behavior for many seven years.
address@hidden ignore
-
address@hidden follows the POSIX rules.
+that had been its behavior for many years.
+
+When @value{PVERSION} 4.0.0, was released, the @command{gawk} maintainer
+made the POSIX rules the default, breaking well over a decade's worth
+of backwards address@hidden was rather naive of him, despite
+there being a note in this section indicating that the next major version
+would move to the POSIX rules.} Needless to say, this was a bad idea,
+and as of @value{PVERSION} 4.0.1, @command{gawk} resumed its historical
+behavior, and only follows the POSIX rules when @option{--posix} is given.
 
 The rules for @code{gensub()} are considerably simpler. At the runtime
 level, whenever @command{gawk} sees a @samp{\}, if the following character
diff --git a/test/ChangeLog b/test/ChangeLog
index 2c6bec5..21ccf3e 100644
--- a/test/ChangeLog
+++ b/test/ChangeLog
@@ -2,6 +2,8 @@
 
        * sortu.awk, sortu.ok: Modified to make numeric comparison do
        a stable sort.  Thanks to Peter Fales <address@hidden>.
+       * backgsub.ok: Update for change in code.
+       * Makefile.am (posix2008sub): Add --posix to invocation.
 
 2011-07-26         Arnold D. Robbins     <address@hidden>
 
diff --git a/test/Makefile.am b/test/Makefile.am
index 82e0834..9780e79 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -1376,9 +1376,14 @@ profile3:
        @sed 1,2d < awkprof.out > _$@; rm awkprof.out
        @-$(CMP) $(srcdir)/address@hidden _$@ && rm -f _$@
 
+posix2008sub:
+       @echo $@
+       @$(AWK) --posix -f $(srcdir)/address@hidden > _$@ 2>&1
+       @-$(CMP) $(srcdir)/address@hidden _$@ && rm -f _$@
+
 next:
        @echo $@
-       @-AWK="$(AWKPROG)" $(srcdir)/address@hidden > _$@ 2>&1
+       @-AWK="$(AWKPROG)" $(srcdir)/address@hidden
        @-$(CMP) $(srcdir)/address@hidden _$@ && rm -f _$@
 
 exit:
diff --git a/test/backgsub.ok b/test/backgsub.ok
index 2d3f17f..e2e265f 100644
--- a/test/backgsub.ok
+++ b/test/backgsub.ok
@@ -1 +1 @@
-\x\y\z
+\\x\\y\\z



reply via email to

[Prev in Thread] Current Thread [Next in Thread]