[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: minor documentation suggestion for FS values and "whitespace" in gen
From: |
arnold |
Subject: |
Re: minor documentation suggestion for FS values and "whitespace" in general |
Date: |
Tue, 24 Mar 2020 03:55:07 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Whitespace is ' ' and '\t'. I wll clarify the documentation, but
likely not in terms of [[:blank:]], since I suspect that in UTF locales
it can match more than just ' ' and '\t'.
Thanks,
Arnold
Ed Morton <address@hidden> wrote:
> I was just looking up which exact characters get included in the set of
> field separators when FS is " " (the default value) and got confused by
> this in the gawk documentation:
>
> Class Meaning
> [:blank:] Space and TAB characters
> [:space:] Space characters (these are: space, TAB, newline,
> carriage return, formfeed and vertical tab)
>
> FS == " "
> Fields are separated by runs of *whitespace*. Leading and
> trailing whitespace are ignored. This is the default.
> /(bold added by me)/
>
> I took the last statement above to mean that FS would be the set of
> characters defined by the [:space:] character class but it's not since
> FS doesn't include carriage return (\r) nor vertical tab (\v) (I didn't
> bother checking others)when FS is " ", neither is it the [:blank:]
> character class since it includes newlines (\n). Instead it seems to be
> [:blank:] plus newline and that's supported by the POSIX spec if we
> assume by <blank> they mean [:blank:]:
>
> ...by default, a field is a string of non- <blank> non- <newline>
> characters.
>
> But what does newline mean in all of the above? Is it always linefeed
> (\n) on all platforms or is it LF (\n) on UNIX and CRLF (\r\n) on
> Windows or something else? I really don't know.
>
> So - maybe you could update the documentation to say "Fields are
> separated by runs of the whitespace (i.e. [:blank:] plus linefeed
> characters)" or similar? I couldn't find anywhere in the documentation
> that states exactly which characters FS includes when assigned " " nor
> what exactly is meant by "whitespace" throughout the documentation and I
> think that one tweak to provide a clear definition of the term
> "whitespace" would clarify all of it.
>
> Ed.
>
>
>
>