Re: [bug-gawk] How to convert recognize a string as a Unicode char?

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] How to convert recognize a string as a Unicode char?

From:	Assaf Gordon
Subject:	Re: [bug-gawk] How to convert recognize a string as a Unicode char?
Date:	Tue, 14 May 2019 00:30:34 -0600
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Hello,


On 2019-05-13 6:05 p.m., Neil R. Ormos wrote:

Peng Yu wrote:

Suppose that there is a file that contains something like the
following, is there a way to recognize as the corresponding Unicode
chars instead of two strings starting with "0x"?

0x2591
0x2592

[...]

gawk --non-decimal-data '{c=0+$0; a=sprintf("%c", c); print length(a); printf 
"%s\n", a;}'

For older versions of gawk, you might need a chicane.


Or if you can use other tools, coreutils' printf can print
unicode code points directly:

   env printf '\u2591\u2592\n'

So just changing '0x' to '\u' and passing on to printf would do the job:

  printf "%s\n" 0x2591 0x2592 | sed 's/^0x/\\\\u/g' | xargs -n1 printf

Or,

If you can convert the ASCII to hex (i.e. '0' 'x' '2' '5' '9' '1' to
'\x25\x91'), you can use "iconv" to convert your UTF16BE to UTF-8

(this is a brevity assumption that you only use unicode codepoints up to0xFFFF, which can be mostly treated as UTF16, if ignoring some edge

cases is acceptable);

    printf "%s\n" 2591 2592 \
       | basenc --base16 --decode | iconv -f utf16be -t utf8

'basenc' (=base-encode) is a new program in coreutils 8.31.
If you don't have it, using 'xxd' can also work to convert ascii to binary:


    printf "%s\n" 2591 2592 | xxd -r -p | iconv -f utf16be -t utf8

There is also the 'uconv' program from the ICU package(http://site.icu-project.org/) which can do alot more unicode conversions.




-assaf

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] How to convert recognize a string as a Unicode char?, Peng Yu, 2019/05/13
- Re: [bug-gawk] How to convert recognize a string as a Unicode char?, Neil R. Ormos, 2019/05/13
  - Re: [bug-gawk] How to convert recognize a string as a Unicode char?, Assaf Gordon <=
    - Re: [bug-gawk] How to convert recognize a string as a Unicode char?, Eli Zaretskii, 2019/05/14

Prev by Date: Re: [bug-gawk] How to convert recognize a string as a Unicode char?
Next by Date: Re: [bug-gawk] On uninitialized variables
Previous by thread: Re: [bug-gawk] How to convert recognize a string as a Unicode char?
Next by thread: Re: [bug-gawk] How to convert recognize a string as a Unicode char?
Index(es):
- Date
- Thread