[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Austin Group questions on iconv()
Re: Austin Group questions on iconv()
Thu, 09 Mar 2023 19:28:23 +0100
Eric Blake wrote in
|In today's Austin Group meeting, the folks discussing POSIX had a
|question for Bruno and/or anyone else with an idea on how the
|standards should approach a difference in behavior between Solaris and
|GNU iconv() implementations.
|For context, today's meeting minutes:
|https://posix.rhansen.org/p/2023-03-09 around line 1635
(Effectively a no-op to look at since it is fullfilled with your
email, is it.)
|and the bugs leading to the question:
| "0001635: iconv: please be more explicit in input-not-convertible case"
| still open - iconv() resulting in EILSEQ not because of input
| encoding error but because of output being unable to encode the
| "0001007: iconv function not allowed to fail to convert valid sequences"
| resolved at https://austingroupbugs.net/view.php?id=1007#c3330,
| standardizing the //IGNORE, //TRANSLIT, and //NON_IDENTICAL_DISCARD
|It seems that bug 1635 is saying that the Solaris implementation
|provides a conversion that application writers can use to get reliable
|output but does not provide some desired features, and the standard
|should change to acknowledge that the GNU implementation provides some
|of those desired features. However, the GNU implementation includes
That all may be 1007.
|some ambiguities that make it unreliable. It seems to ask us to
|change the standard to allow a modified version of the GNU iconv()
|function that could be reliably interpreted by an appication writer.
That is 1635: it gives merits to that the GNU approach that does
|For example, overloading EILSEQ to mean that there was an invalid
|character in the input stream or that there was no transliteration
which application programmers cannot deal with: invalid input and
not being able to convert to some output character set
(losslessly) are very different things. (To at least some
|available in the output codeset to convert that input character makes
|it impossible for an application to determine which of those two
|problems caused iconv() to fail.
|Can we get an explanation on how an application writer is supposed to
|write code to reliably use the iconv() in GNU libc, given the above
|example? Can we get help in identifying exactly what changes need to
I want to urge people to read the GNU bug report that is linked
from 1635 where the honourable author of the GNU iconv library
points to how gnulib does it, which in turn is then quoted again
in issue 1635.
|be made to POSIX (after bugid:1007 has been integrated) to allow GNU
|behavior and get reliable results without breaking applications that
|currently work with the Solaris iconv() interface.
And before _this_ of yours starts rolling, i want to throw in that
transliteration of characters is not the same as placing
a replacement, or doing the failure the GNU does but in a way that
application writers can properly react upon.
Application writers need to be able to write tests,
transliterations may be anything, and change as time goes by.
Being able to fail fast in case of errors is also an important
property that //transliterations do not fulfill.
The merits of the standard "inventing" a special mode that
enforces the GNU behaviour, but with an identifiable error code
instead of the overloaded EILSEQ, would allow exactly this.
Software which supports the //modifiers must transport state
during iconv to react or fail properly, so this seems (looking at
open source code) to be a rather minimal change.
(It could be that the standard already adds keywords that require
work in existing implementations. But i am not sure.)
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)