nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] nmh architecture discussion: format engine character s


From: Ralph Corderoy
Subject: Re: [Nmh-workers] nmh architecture discussion: format engine character set
Date: Mon, 10 Aug 2015 18:29:39 +0100

Hi Ken,

> > > So ... what would that mean, exactly?  Ignore the locale setting
> > > and always output UTF-8?
> >
> > Well, yes, the code would be writing UTF-8, with the knowledge of
> > how many cells have been occupied, e.g. one for the combining `a⃞',
> > but it could complain about the non-UTF-8 locale setting, or try and
> > set up `fire and forget' converter on open and opening files if it
> > was easy enough to be worth the bother.
>
> Help me out here, because I'm trying to translate your concepts into
> actual code and I'm having some problems seeing how it would work.

Geez, how much hand-waving do you want a guy to do?  :-)

> Assuming we don't bring in a library like ICU,

GNU's libunistring might be an alternative to ICU.
http://www.gnu.org/software/libunistring/

> it's difficult for us to reliably determine the width of a Unicode
> character.  Specifically:
>
> - The POSIX standard functions for this, wcwidth() and wcswidth(), work
>   on the current locale, which is not guaranteed to support UTF-8 (or
>   even support 8-bit characters).

Agreed, POSIX is useless in this area.

> - The xlocale functions which allow one to specify a specific a locale
>   to functions like wcwidth() are not part of POSIX.

No.

> - Even if we used xlocale (or just overrode the global locale in every
>   nmh program) it turns out there's not a reliable UTF-8 compatible
>   default we can use; we ran into this in the test suite, some people
>   just don't install all of the locales, so we can't assume en_US.UTF-8
>   (or en_GB.UTF-8, or whatever).

That wouldn't matter if we stopped on a non-UTF-8 locale?

> I'm unclear how you wnated to use the iconv utility; is the idea just
> output everything in UTF-8 and run iconv as a filter for all text
> output?

Yes, as a last-ditch attempt if we carry on.

> I think that might have unintended consequences, but putting
> that aside there are other issues.  For one, iconv can't do character
> substitution on conversion failure (at least the POSIX iconv cannot; I
> am aware that GNU iconv can).  Even if it can, I am unsure we can
> maintain the correct column position when dealing with things like
> combining characters.

Yes, either iconv isn't bothered with, because it's too awkward and the
results are ropey, or it is used because it's good enough most of the
time for the small minority that want it.

> But hey, if I'm wrong I'd be glad to hear about it.  I think it's a
> much tougher problem than people realize.

I'm sure it is.

Cheers, Ralph.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]