[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] When RS is null, POSIX states \n should be in FS, gawk only d
From: |
Ed Morton |
Subject: |
[bug-gawk] When RS is null, POSIX states \n should be in FS, gawk only does that if FS is single char |
Date: |
Mon, 15 Apr 2019 08:35:28 -0500 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
I just came across this where setting RS to null causes FS to include
`\n` if FS is a singe char but not otherwise:
$ printf '1:2\n3\n' | awk -F':' -v RS= '{for (i=1; i<=NF; i++) print
i"/"NF, "<"$i">"}'
1/3 <1>
2/3 <2>
3/3 <3>
$ printf '1::2\n3\n' | awk -F'::' -v RS= '{for (i=1; i<=NF; i++)
print i"/"NF, "<"$i">"}'
1/2 <1>
2/2 <2
3>
with this gawk version:
$ awk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2018 Free Software Foundation.
and that makes sense given the gawk documentation
(https://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line) which
says (red/underline mine):
When RS is set to the empty string _/and /__FS is set to a single
character_, the newline character always acts as a field separator.
This is in addition to whatever field separations result from FS^
but the POSIX spec (http://pubs.opengroup.org/onlinepubs/9699919799/) says:
*RS*
The first character of the string value of *RS* shall be the
input record separator; a <newline> by default. If *RS* contains
more than one character, the results are unspecified. If *RS* is
null, then records are separated by sequences consisting of a
<newline> plus one or more blank lines, leading or trailing
blank lines shall not result in empty records at the beginning
or end of the input, and a <newline> shall always be a field
separator, no matter what the value of *FS* is.
gawk behaves the way I described with or without the `--posix` flag.
Shouldn't it add `\n` as a separator when RS is null regardless of the
value of FS like POSIX says? FWIW OSX/BSD awk on MacOS behaves the same
way that gawk does, idk about other awks.
Ed.
- [bug-gawk] When RS is null, POSIX states \n should be in FS, gawk only does that if FS is single char,
Ed Morton <=