bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk


From: Wolfgang Laun
Subject: Re: [bug-gawk] [External] Re: Invalid Characters Causing Problems in awk 4.0.2
Date: Fri, 24 Aug 2018 12:06:23 +0200

You get other inconsistencies if a text file's encoding doesn't match the system's default. If a /bin/cat displays the text correctly, /usr/bin/wc can be trusted. But primarily I rely on /usr/bin/od to show me what a text contains. For writing code, I prefer programming systems that let me overrule system and environment settings.
-W

On 24 August 2018 at 11:33, Eli Zaretskii <address@hidden> wrote:
> From: Wolfgang Laun <address@hidden>
> Date: Fri, 24 Aug 2018 08:28:07 +0200
> Cc: "address@hidden" <address@hidden>
>
> File diacrit.txt contains all the 20 non-ASCII characters you need for Spanish in one line (including \n) with
> UTF-8 encoding:
>
> ¡¿ªºÁáÉéÍíÑñÓóÚúÜüÇç
>
> $ wc -c diacrit.txt
> 41 diacrit.txt
> $ wc -m diacrit.txt
> 21 diacrit.txt

Of course, this only works if the file is encoded in the same encoding
as specified by the current locale.  Because 'wc' doesn't detect the
encoding, it assumes the locale's codeset.

E.g., try the same in a locale whose codeset in ISO 8859-1, while the
file is still UTF-8 encoded.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]