bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk


From: arnold
Subject: Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk 4.0.2
Date: Fri, 24 Aug 2018 02:36:48 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Gilbert,

Hi. Welcome to the wonderful world of Unicode, UTF-8, and multibyte encodings.
It is something of a sudden dunk into the cold, deep end of the lake if you've
never been exposed to it before.

Gawk works on characters, not bytes, based on the current locale as
provided by various LC_* environment variables.  This has been the case
since about version 3.1.5.

You have a few options to cause gawk to treat each byte as its own
character:

1. Put LC_ALL=C into the environment. That should override the other
environment variables.

2. Use gawk's -b option.

Or you can adjust your scripts to understand that bytes != characters.

In any case, I would recommend using version 4.2.1, which is the current
released version, instead of 4.0.2 which is 6 years old.

Best of luck,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]