[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] How to convert recognize a string as a Unicode char?
From: |
Neil R. Ormos |
Subject: |
Re: [bug-gawk] How to convert recognize a string as a Unicode char? |
Date: |
Mon, 13 May 2019 19:05:33 -0500 (CDT) |
Peng Yu wrote:
> Suppose that there is a file that contains something like the
> following, is there a way to recognize as the corresponding Unicode
> chars instead of two strings starting with "0x"?
> 0x2591
> 0x2592
It might be helpful if you would state:
* on which operating system you expect to do this;
* in which locale (if the OS is locale-aware);
* in which target character encoding you want the
results; and
* how you intend to establish "correspondence" between
the input data and "Unicode chars".
If the answers are Linux, en_US.utf8, UTF-8, and
"each line of input data is intended to represent
in hexidecimal characters the number of the
Unicode code point", this trivial snippet seems to
work in gawk 4.1.4 and 5.0.0:
gawk --non-decimal-data '{c=0+$0; a=sprintf("%c", c); print length(a); printf
"%s\n", a;}'
For older versions of gawk, you might need a chicane.