[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh architecture discussion: format engine character s

From: Ken Hornstein
Subject: Re: [Nmh-workers] nmh architecture discussion: format engine character set
Date: Mon, 10 Aug 2015 13:09:34 -0400

>> So ... what would that mean, exactly?  Ignore the locale setting and
>> always output UTF-8?
>Well, yes, the code would be writing UTF-8, with the knowledge of how
>many cells have been occupied, e.g. one for the combining `a⃞', but it
>could complain about the non-UTF-8 locale setting, or try and set up
>`fire and forget' converter on open and opening files if it was easy
>enough to be worth the bother.

Help me out here, because I'm trying to translate your concepts into
actual code and I'm having some problems seeing how it would work.

Assuming we don't bring in a library like ICU, it's difficult for us
to reliably determine the width of a Unicode character.  Specifically:

- The POSIX standard functions for this, wcwidth() and wcswidth(), work
  on the current locale, which is not guaranteed to support UTF-8 (or
  even support 8-bit characters).

- The xlocale functions which allow one to specify a specific a locale
  to functions like wcwidth() are not part of POSIX.

- Even if we used xlocale (or just overrode the global locale in every
  nmh program) it turns out there's not a reliable UTF-8 compatible
  default we can use; we ran into this in the test suite, some people
  just don't install all of the locales, so we can't assume en_US.UTF-8
  (or en_GB.UTF-8, or whatever).

I'm unclear how you wnated to use the iconv utility; is the idea just
output everything in UTF-8 and run iconv as a filter for all text
output?  I think that might have unintended consequences, but putting
that aside there are other issues.  For one, iconv can't do character
substitution on conversion failure (at least the POSIX iconv cannot; I
am aware that GNU iconv can).  Even if it can, I am unsure we can maintain
the correct column position when dealing with things like combining

But hey, if I'm wrong I'd be glad to hear about it.  I think it's a much
tougher problem than people realize.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]