bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: setting RS to null changes field splitting with the default FS


From: Ed Morton
Subject: Re: setting RS to null changes field splitting with the default FS
Date: Wed, 1 Apr 2020 08:36:42 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

Arnold - Thanks for the quick turnaround as usual!

    Ed.

On 4/1/2020 6:45 AM, address@hidden wrote:
Hi Ed.

Thanks for the report.  You have indeed found a buglet.  The
fix is below. I will add this as a test case in the test suite.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

Setting RS to null in gawk (tested with version 4.1.4 on Mac and 5.0.1
on cygwin) seems to change how field splitting works with the default FS.

I understand this:

     $ echo ' a b c ' | awk '{print NF, "<" $0 ":" RT ">"; for (i=1;
     i<=NF; i++) print i, "[" $i "]"}'
     3 < a b c :
      >
     1 [a]
     2 [b]
     3 [c]

because the default FS setting is causing leading/trailing white space
to be ignored when the record is split into fields but now look at this:

     $ echo ' a b c ' | awk -v RS='' '{print NF, "<" $0 ":" RT ">"; for
     (i=1; i<=NF; i++) print i, "[" $i "]"}'
     4 < a b c :
      >
     1 [a]
     2 [b]
     3 [c]
     4 []

Why is there a 4th field? I THINK it's a bug that in that 2nd script the
trailing white space is not ignored when the record is split into
fields. FWIW I tested that last script with OSX/BSD awk too and it did
strip off the trailing blank and leave 3 fields as I expected.

      Ed.
------------------------------------
diff --git a/field.c b/field.c
index efbc7092..bae16e9c 100644
--- a/field.c
+++ b/field.c
@@ -463,7 +463,10 @@ re_parse_field(long up_to, /* parse only up to this field 
number */
        if (len == 0)
                return nf;
+ bool default_field_splitting = false;
        if (RS_is_null && default_FS) {
+               default_field_splitting = true;
+
                sep = scan;
                while (scan < end && (*scan == ' ' || *scan == '\t' || *scan == 
'\n'))
                        scan++;
@@ -504,7 +507,7 @@ re_parse_field(long up_to,  /* parse only up to this field 
number */
                                (long) (REEND(rp, scan) - RESTART(rp, scan)), 
sep_arr);
                scan += REEND(rp, scan);
                field = scan;
-               if (scan == end)        /* FS at end of record */
+               if (scan == end && ! default_field_splitting)   /* FS at end of 
record */
                        (*set)(++nf, field, 0L, n);
        }
        if (nf != up_to && scan < end) {



reply via email to

[Prev in Thread] Current Thread [Next in Thread]