[bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac enc

bug-gnu-libiconv

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac enc

From:	Assaf Gordon
Subject:	[bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac encoder on macOS gives incorrect output)
Date:	Thu, 26 Oct 2017 14:23:49 -0600
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello,

(cc'ing coreutils@)

On 2017-10-26 05:50 AM, Marcin Sulikowski wrote:

I've been trying to use libiconv on macOS to convert UTF-8 strings totheir NFD form using libiconv's "utf-8-mac" encoding which is availableon macOS.


FWIW,

In GNU coreutils we are working on a unicode normalization program(unorm) which can perform nfd/nfc/nfkd/nfkc conversions and othermultibyte character processing.

It is still highly experimental, but produces the following output basedon your input:


===
$ ( printf "a%.0s" `seq 4094` ; echo -n ó ) \
     | unorm --normalization=nfd \
     | hexdump -e '8/1 "%02x " "\n"'
61 61 61 61 61 61 61 61
*
61 61 61 61 61 61 6f cc
81
===

Where U+00f3 (\xc3 \xb3) was normalized to "o" + U+0301 (\x6F \xCC \x81).


More information about the multibyte implementation progress is here:
 https://crashcourse.housegordon.org/coreutils-multibyte-support.html

If you'd like to experiment with the program, a snapshot is  here:
http://files.housegordon.org/src/coreutils-multibyte-experimental-8.28.39-79242.tar.xz
(note this is unstable and unsupported snapsot of coreutils code).

Any feedback is appreciated.

regards,
 - assaf

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output, Marcin Sulikowski, 2017/10/26
- Re: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output, Bruno Haible, 2017/10/26
  - Re: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output, Marcin Sulikowski, 2017/10/27
- [bug-gnu-libiconv] unicode normalization / unorm (was: The utf-8-mac encoder on macOS gives incorrect output), Assaf Gordon <=

Prev by Date: Re: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output
Next by Date: Re: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output
Previous by thread: Re: [bug-gnu-libiconv] The utf-8-mac encoder on macOS gives incorrect output
Next by thread: [bug-gnu-libiconv] libiconv fails to build for Windows (Visual Studio, via cygwin)
Index(es):
- Date
- Thread