|
From: | Gilbert, Brandon (Synchrony) |
Subject: | Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk 4.0.2 |
Date: | Thu, 23 Aug 2018 14:00:58 +0000 |
Hi, It is mostly with special Spanish characters in names and trademark characters in business names.
Due to the confidentiality of the data, I am unable to send examples.
I can say that when I pulled the records into Ultra-Edit, and I highlighted characters on the line, it showed the byte size as
doubled (1 character showed byte length of 2 and 2 characters as 4, etc.). Doing some on-line research, since sending the 1st e-mail to you, I found a message board where someone noted the
following: For a given
awk implementation to work properly with non-ASCII characters (foreign letters), it must respect the active locale's character encoding, as reflected in the (effective) LC_CTYPE setting (run locale to see it). These days, most locales use UTF-8 encoding, a multi-byte-on-demand encoding that is
single-byte in the ASCII range, and uses 2 to 4 bytes to represent all other Unicode characters. Thus, for a given
awk implementation to recognize non-ASCII (accented, foreign) letters, it must be able to recognize multiple bytes as a single character. So I did a compare of the
locale command output on each system.
The older system, that does not have problems with the characters, has
LC_COLLATE=C, and the new system, that does have problems has
LC_COLLATE="en_US.UTF-8".
All other settings are match, and are set to
en_US.UTF-8 . Could this be a cause? Thank you for your help! …Brandon From: Wolfgang Laun <address@hidden>
What is a "non-standard character"? ISO 10646 is quite comprehensive. - Bug notices without examples aren't likely to cause a stir. -W On 22 August 2018 at 22:48, Gilbert, Brandon (Synchrony) <address@hidden> wrote:
|
[Prev in Thread] | Current Thread | [Next in Thread] |