bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] bug-gawk Digest, Vol 88, Issue 18


From: Denis Shirokov
Subject: Re: [bug-gawk] bug-gawk Digest, Vol 88, Issue 18
Date: Thu, 23 Aug 2018 17:47:09 +0300

Try "gawk -b". Its force gawk to understand every byte as the single character.

чт, 23 авг. 2018 г., 17:28 <address@hidden>:
Send bug-gawk mailing list submissions to
        address@hidden

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.gnu.org/mailman/listinfo/bug-gawk
or, via email, send a message with subject or body 'help' to
        address@hidden

You can reach the person managing the list at
        address@hidden

When replying, please edit your Subject line so it is more specific
than "Re: Contents of bug-gawk digest..."


Today's Topics:

   1. Re: Invalid Characters Causing Problems in awk 4.0.2
      (Wolfgang Laun)
   2. Re: Invalid Characters Causing Problems in awk 4.0.2
      (address@hidden)
   3. Re: [External] Re: Invalid Characters Causing Problems in awk
      4.0.2 (Gilbert, Brandon (Synchrony))


----------------------------------------------------------------------

Message: 1
Date: Thu, 23 Aug 2018 05:47:25 +0200
From: Wolfgang Laun <address@hidden>
To: "Gilbert, Brandon (Synchrony)" <address@hidden>
Cc: "address@hidden" <address@hidden>
Subject: Re: [bug-gawk] Invalid Characters Causing Problems in awk
        4.0.2
Message-ID:
        <address@hidden>
Content-Type: text/plain; charset="utf-8"

What is a "non-standard character"? ISO 10646 is quite comprehensive. - Bug
notices without examples aren't likely to cause a stir.

-W

On 22 August 2018 at 22:48, Gilbert, Brandon (Synchrony) <
address@hidden> wrote:

> Hi,
>
>
>
> We are converting from one Linux system to another Linux system.
>
> The old system has awk version 3.1.3 and the new version has awk 4.0.2.
>
>
>
> In the version 3.1.3, text records with non-standard characters, the
> records are processed with no problem by awk.
>
> In the version 4.0.2, text records with non-standard characters are
> ignored and not processed.
>
>
>
> Is there a way to fix this issue, or to be able to ignore non-standard
> characters with this newer version of awk?  Or is there a new version
> than 4.0.2 that will resolve this issue?
>
>
>
> Thank you.
>
>
>
>
>
> *Brandon Gilbert*
> IT Analyst
>
> Canton Video Committee Lead
>
> Synchrony
>
> T: 330-433-5042
> E: *address@hidden <address@hidden>*
>
> 4500 Munson St NW
> <https://maps.google.com/?q=4500+Munson+St+NW+%0D%0A+Canton,+OH+44718,+U.S.&entry=gmail&source=g>
>
> Canton, OH 44718, U.S.
> <https://maps.google.com/?q=4500+Munson+St+NW+%0D%0A+Canton,+OH+44718,+U.S.&entry=gmail&source=g>
>
>
>
> [image:
> https://www.synchronybrandvault.com/protected/extensions/kcfinder/upload/files/synchrony_logo_for_email%281%29.png]
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gnu.org/archive/html/bug-gawk/attachments/20180823/b3c8b5ff/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4360 bytes
Desc: not available
URL: <http://lists.gnu.org/archive/html/bug-gawk/attachments/20180823/b3c8b5ff/attachment.png>

------------------------------

Message: 2
Date: Wed, 22 Aug 2018 22:05:12 -0600
From: address@hidden
To: address@hidden, address@hidden
Subject: Re: [bug-gawk] Invalid Characters Causing Problems in awk
        4.0.2
Message-ID: <address@hidden>
Content-Type: text/plain; charset=us-ascii

Hi.

An example would help.

The latest version of gawk is 4.2.1, much further ahead of 4.0.2; the version
you're upgrading from (3.1.3) is pushing 16 years old! Even 4.0.2 is six
years old!

In any case, I susepct that if you do

        export LC_ALL=C

before running any scripts that things will work as you expect them
to.

Thanks,

Arnold

"Gilbert, Brandon (Synchrony)" <address@hidden> wrote:

> Hi,
>
> We are converting from one Linux system to another Linux system.
> The old system has awk version 3.1.3 and the new version has awk 4.0.2.
>
> In the version 3.1.3, text records with non-standard characters, the records are processed with no problem by awk.
> In the version 4.0.2, text records with non-standard characters are ignored and not processed.
>
> Is there a way to fix this issue, or to be able to ignore non-standard characters with this newer version of awk?  Or is there a new version than 4.0.2 that will resolve this issue?
>
> Thank you.
>
>
> Brandon Gilbert
> IT Analyst
> Canton Video Committee Lead
> Synchrony
>
> T: 330-433-5042
> E: address@hidden<mailto:address@hidden>
> 4500 Munson St NW
> Canton, OH 44718, U.S.
>
> [https://www.synchronybrandvault.com/protected/extensions/kcfinder/upload/files/synchrony_logo_for_email%281%29.png]
>



------------------------------

Message: 3
Date: Thu, 23 Aug 2018 14:00:58 +0000
From: "Gilbert, Brandon (Synchrony)" <address@hidden>
To: Wolfgang Laun <address@hidden>
Cc: "address@hidden" <address@hidden>
Subject: Re: [bug-gawk] [External] Re: Invalid Characters Causing
        Problems in awk 4.0.2
Message-ID:
        <address@hidden>
Content-Type: text/plain; charset="utf-8"

Hi,

It is mostly with special Spanish characters in names and trademark characters in business names.  Due to the confidentiality of the data, I am unable to send examples.  I can say that when I pulled the records into Ultra-Edit, and I highlighted characters on the line, it showed the byte size as doubled (1 character showed byte length of 2 and 2 characters as 4, etc.).

Doing some on-line research, since sending the 1st e-mail to you, I found a message board where someone noted the following:
For a given awk implementation to work properly with non-ASCII characters (foreign letters), it must respect the active locale's character encoding, as reflected in the (effective) LC_CTYPE setting (run locale to see it).
These days, most locales use UTF-8 encoding, a multi-byte-on-demand encoding that is single-byte in the ASCII range, and uses 2 to 4 bytes to represent all other Unicode characters.
Thus, for a given awk implementation to recognize non-ASCII (accented, foreign) letters, it must be able to recognize multiple bytes as a single character.

So I did a compare of the locale command output on each system.  The older system, that does not have problems with the characters, has LC_COLLATE=C, and the new system, that does have problems has LC_COLLATE="en_US.UTF-8".  All other settings are match, and are set to en_US.UTF-8 . Could this be a cause?

Thank you for your help!
?Brandon

From: Wolfgang Laun <address@hidden>
Sent: Wednesday, August 22, 2018 11:47 PM
To: Gilbert, Brandon (Synchrony) <address@hidden>
Cc: address@hidden
Subject: [External] Re: [bug-gawk] Invalid Characters Causing Problems in awk 4.0.2

What is a "non-standard character"? ISO 10646 is quite comprehensive. - Bug notices without examples aren't likely to cause a stir.
-W

On 22 August 2018 at 22:48, Gilbert, Brandon (Synchrony) <address@hidden<mailto:address@hidden>> wrote:
Hi,

We are converting from one Linux system to another Linux system.
The old system has awk version 3.1.3 and the new version has awk 4.0.2.

In the version 3.1.3, text records with non-standard characters, the records are processed with no problem by awk.
In the version 4.0.2, text records with non-standard characters are ignored and not processed.

Is there a way to fix this issue, or to be able to ignore non-standard characters with this newer version of awk?  Or is there a new version than 4.0.2 that will resolve this issue?

Thank you.


Brandon Gilbert
IT Analyst
Canton Video Committee Lead
Synchrony

T: 330-433-5042
E: address@hidden<mailto:address@hidden>
4500 Munson St NW<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D4500-2BMunson-2BSt-2BNW-2B-250D-250A-2BCanton-2C-2BOH-2B44718-2C-2BU.S.-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=i0QXx0LZaNWl3bsI0Hrdtw&r=HdMP0I4Nf-VnDPR-QLN2P7ZRKXNj4SMUCVEXBI4xq5Y&m=6uNWhosZ5C1LQitfGt06RB7RyF7JBPVcsUYquR67fSs&s=kVb7eBjf9kcEf7DFAGfl8lAAMO2ImlzUSzHl51vsZb0&e=>
Canton, OH 44718, U.S.<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D4500-2BMunson-2BSt-2BNW-2B-250D-250A-2BCanton-2C-2BOH-2B44718-2C-2BU.S.-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=i0QXx0LZaNWl3bsI0Hrdtw&r=HdMP0I4Nf-VnDPR-QLN2P7ZRKXNj4SMUCVEXBI4xq5Y&m=6uNWhosZ5C1LQitfGt06RB7RyF7JBPVcsUYquR67fSs&s=kVb7eBjf9kcEf7DFAGfl8lAAMO2ImlzUSzHl51vsZb0&e=>

[https://www.synchronybrandvault.com/protected/extensions/kcfinder/upload/files/synchrony_logo_for_email%281%29.png]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gnu.org/archive/html/bug-gawk/attachments/20180823/02648413/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4360 bytes
Desc: image001.png
URL: <http://lists.gnu.org/archive/html/bug-gawk/attachments/20180823/02648413/attachment.png>

------------------------------

Subject: Digest Footer

_______________________________________________
bug-gawk mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/bug-gawk


------------------------------

End of bug-gawk Digest, Vol 88, Issue 18
****************************************

reply via email to

[Prev in Thread] Current Thread [Next in Thread]