bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] How to convert recognize a string as a Unicode char?


From: Neil R. Ormos
Subject: Re: [bug-gawk] How to convert recognize a string as a Unicode char?
Date: Mon, 13 May 2019 19:05:33 -0500 (CDT)

Peng Yu wrote:

> Suppose that there is a file that contains something like the
> following, is there a way to recognize as the corresponding Unicode
> chars instead of two strings starting with "0x"?
 
> 0x2591
> 0x2592

It might be helpful if you would state:

  *  on which operating system you expect to do this;

  *  in which locale (if the OS is locale-aware);

  *  in which target character encoding you want the 
     results; and

  *  how you intend to establish "correspondence" between 
     the input data and "Unicode chars".

If the answers are Linux, en_US.utf8, UTF-8, and
"each line of input data is intended to represent
in hexidecimal characters the number of the
Unicode code point", this trivial snippet seems to
work in gawk 4.1.4 and 5.0.0:

gawk --non-decimal-data '{c=0+$0; a=sprintf("%c", c); print length(a); printf 
"%s\n", a;}'

For older versions of gawk, you might need a chicane.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]