bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: minor documentation suggestion for FS values and "whitespace" in gen


From: arnold
Subject: Re: minor documentation suggestion for FS values and "whitespace" in general
Date: Tue, 24 Mar 2020 03:55:07 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Whitespace is ' ' and '\t'.  I wll clarify the documentation, but
likely not in terms of [[:blank:]], since I suspect that in UTF locales
it can match more than just ' ' and '\t'.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

> I was just looking up which exact characters get included in the set of 
> field separators when FS is " " (the default value) and got confused by 
> this in the gawk documentation:
>
>     Class    Meaning
>     [:blank:]    Space and TAB characters
>     [:space:]    Space characters (these are: space, TAB, newline,
>     carriage return, formfeed and vertical tab)
>
>     FS == " "
>          Fields are separated by runs of *whitespace*. Leading and
>     trailing whitespace are ignored. This is the default.
>     /(bold added by me)/
>
> I took the last statement above to mean that FS would be the set of 
> characters defined by the [:space:] character class but it's not since 
> FS doesn't include carriage return (\r) nor vertical tab (\v) (I didn't 
> bother checking others)when FS is " ", neither is it the [:blank:] 
> character class since it includes newlines (\n). Instead it seems to be 
> [:blank:] plus newline and that's supported by the POSIX spec if we 
> assume by <blank> they mean [:blank:]:
>
>     ...by default, a field is a string of non- <blank> non- <newline>
>     characters.
>
> But what does newline mean in all of the above? Is it always linefeed 
> (\n) on all platforms or is it LF (\n) on UNIX and CRLF (\r\n) on 
> Windows or something else? I really don't know.
>
> So - maybe you could update the documentation to say "Fields are 
> separated by runs of the whitespace (i.e. [:blank:] plus linefeed 
> characters)" or similar? I couldn't find anywhere in the documentation 
> that states exactly which characters  FS includes when assigned " " nor 
> what exactly is meant by "whitespace" throughout the documentation and I 
> think that one tweak to provide a clear definition of the term 
> "whitespace" would clarify all of it.
>
>      Ed.
>
>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]