bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Mistaken interpretation of the POSIX standard causes Gawk


From: arnold
Subject: Re: [bug-gawk] Mistaken interpretation of the POSIX standard causes Gawk 4.1.1 not to recognize newlines as field separators with -P
Date: Sun, 31 May 2015 11:59:16 -0400
User-agent: Heirloom mailx 12.5 6/20/10

Hi.

Thank you for sending in a bug report.  Sorry for the delay in
replying, I've been travelling.

> From: Michael Klement <address@hidden>
> Date: Mon, 25 May 2015 13:08:03 -0400
> To: address@hidden
> Subject: [bug-gawk] Mistaken interpretation of the POSIX standard causes
>       Gawk 4.1.1 not to recognize newlines as field separators with -P
> Hi,
>
> Since at least the 2004 edition of the POSIX standard
> (http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
> <http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html>)
> newlines should always be considered field separators, irrespective of
> the value of `FS`; from the "Variables and Special Variables" section:
>
>       "a <newline> shall always be a field separator, no matter what
>       the value of FS is."

You have quoted this out of context. It is in the description of the
RS variable, and applies only when RS = "". It is not relevant to
your point.

> This language is still present in the 2013 edition,
> (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
> <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html>).
>
> (The unfortunate thing is that the 2004 edition contained sloppy language
> in the "Description" section, which seemingly contradicts the above:
>
> "a field is a string of non- <blank>s"
>
> This has been corrected in the 2013 edition:
>
> "a field is a string of non- <blank> non- <newline> characters"
> )

This is indeed the key change.

> By contrast, Gawk - as of version  4.1.1 - states in its manual under
> -P, --posix
> 
> Only space and tab act as field separators when FS is set to a single
> space, newline does not.

FWIW, I can't find this exact text in the manual.

> and acts accordingly (except when RS is set to the empty string).
>
> The bottom line is: If the above change in behavior is the only one that
> -P / --posix effects,

It's not, as is clearly documented throughout the manual.

> this option should never have been introduced in the first place,
> because Gawks *default* behavior is actually the POSIX-compliant one,
> and using the option - somewhat ironically - makes Gawk NON-compliant.

This is incorrect. Because of the earlier language, gawk *was*
POSIX-compliant with --posix.  The recent change you point out allows
me to change gawk's behavior so that the default of using newline also
applies when --posix is in effect.

I have changed the code in the master branch such that the same code
is used with and without --posix for default field parsing. I have also
updated the obvious spots in the documentation.  These changes will
appear in the repo shortly and be part of the next major release.

Thanks for bringing this to my attention.

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]