[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff.
From: |
Lapo Luchini |
Subject: |
[Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff. |
Date: |
Thu, 15 Feb 2007 20:25:19 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.0.9) Gecko/20061207 Thunderbird/1.5.0.9 Mnenhy/0.7.4.0 |
Zack Weinberg wrote:
> The //IGNORE and //TRANSLIT features are glibc / GNU libiconv
> specific, but I would have thought that they were available in recent
> Gentoo (they've been around since 2001 give or take).
I thought they would be present on *most* BSD and Linux available today...
Uh. I know nothing about Gentoo, but I would have thought it was in
Portage, but this doesn't seem to be it at all:
http://gentoo-portage.com/dev-libs/libiconv
> The real problem, though, is that an awful lot of non-GNUish systems
> have iconv implementations that are useless. I mean _useless_. They
> implement hardly any conversions at all. We have to have the "(list
> of names for ASCII) <-> UTF8" shortcut for _correctness_, not just for
> speed; real live systems don't support conversion between their own
> locale's name for ASCII and UTF-8. *headdesk*
Well, an iconv that doesn't even know how to make conversion *to* UTF8
is no good for us: we simply can't use it.
An iconv that doesn't know about //IGNORE//TRANSLIT, OTOH, is good for
the strict sanity conversion, but not good for the "best effort"
print-to-the-terminal that I wired into "mtn log" (but other places
would need that, too).
I guess the "solution" could be to add an autoconf test for support of
//IGNORE//TRANSLIT and, when not available, we can easily write a
"quick&dirty" lossy conversion from UTF8 to either Latin1 or ASCII:
#define UTF8_to_Latin1(u) ((u >= 256) ? '?' : (char)u)
#define UTF8_to_ASCII(u) ((u >= 128) ? '?' : (char)u)
Or maybe we could get the "transliteration table" right out of iconv...
> It might be possible to bundle GNU libiconv, but I hesitate to
> recommend that because I recall its being another Haible/Drepper build
> system monstrosity like intl.
IMHO we bundle already too much =)
> Many systems have an iconv(1) command line utility that may be helpful
> here.
Uh, right, but writing a "known good UTF-8 string" escaped on the
command line seems a bit trickier to me... no, not really.
% echo "\xC2\xB7" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT
· (that is, the correct and converted U+00B7 MIDDLE DOT)
% echo "\xC2\xB7" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
.
% echo "\xC3\x80" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT
À (that is, correct U+00C0 LATIN CAPITAL LETTER A WITH GRAVE)
% echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
`A
Derek (or anyonelse with Gentoo), what do you get with these?
Lapo
- [Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Timothy Brownawell, 2007/02/14
- Re: [Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Nathaniel Smith, 2007/02/15
- [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Lapo Luchini, 2007/02/15
- Re: [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Zack Weinberg, 2007/02/15
- [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff.,
Lapo Luchini <=
- [Monotone-devel] iconv diffs [Was: Why is utf8...], Lapo Luchini, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Nathaniel Smith, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Patrick Georgi, 2007/02/17
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ethan Blanton, 2007/02/17
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Keller, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16