Dear libiconv Team,
I've been trying to use libiconv on macOS to convert UTF-8 strings to their NFD form using libiconv's "utf-8-mac" encoding which is available on macOS. This does not always work for me -- in some cases, the iconv function returns incorrect output; apparently when a character to be decomposed cannot be fully written to the given output buffer. The bug can be easily reproduced using the iconv command-line tool on macOS:
( printf "a%.0s" `seq 4094` ; echo -n ó ) | iconv -f utf-8 -t utf-8-mac | iconv -f utf-8-mac -t utf-8
The echo's argument is "LATIN SMALL LETTER O WITH ACUTE" (U+00F3) which decomposes to the latin 'o' plus the combining acute. I'd expect this to print aaaaaaa...aaaaaó (i.e., exactly the same as `printf "a%.0s" `seq 4094` ; echo -n ó` prints) but I get aaaaa...aaaaaaao instead -- the accent at the end is lost in the conversion because `iconv -f utf-8 -t utf-8-mac` does not output the combining character at the end:
$ ( printf "a%.0s" `seq 4094` ; echo -n ó ) | iconv -f utf-8 -t utf-8-mac | hexdump -C
00000000 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 61 |aaaaaaaaaaaaaaaa|
*
00000ff0 61 61 61 61 61 61 61 61 61 61 61 61 61 61 6f |aaaaaaaaaaaaaao|
00000fff
This is the version of iconv that I'm using:
$ iconv --version
iconv (GNU libiconv 1.11)
Copyright (C) 2000-2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Bruno Haible.
regards,
Marcin Sulikowski