[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: problems with i18n testsuite
From: |
Robert Bihlmeyer |
Subject: |
[Monotone-devel] Re: problems with i18n testsuite |
Date: |
Wed, 21 Apr 2004 21:20:25 +0200 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Security Through Obscurity, linux) |
graydon hoare <address@hidden> writes:
> Robert Bihlmeyer wrote:
>
>> I think the best solution is to assume UTF-8, and use LC_CTYPE's
>> charset in case the filename is not valid UTF-8.
>
[...]
> - if I commit on an EUC-KR machine, the filename is not valid UTF-8;
> but the filename is representable in UTF-8 if I do a conversion.
If your LC_CTYPE is something like kr_KR.EUC-KR my algorithm will work
in this case. If your LC_CTYPE is C or something else entirely, no
automatic guessing will do.
> - if I checkout from monotone (UTF-8) to a EUC-KR machine, the
> UTF-8 filename is not valid EUC-KR, but it is representable in
> EUC-KR if I do a conversion.
I wasn't thinking of checkout yet. I have a weak preference for
defaulting to UTF-8 filenames.
[snip]
Basically, I want to make the point: the LC_CTYPE of your shell need
not match the charset of all your filenames, or the charset of all
your files' contents. And there is no other way to infer a "local
charset".
I'm still unclear on what you do with file content. Do you convert
from whatever you assume as the local charset to UTF-8 for storage and
hash computation? Wouldn't that fail horribly for non-text content?
I'd really like version control systems to get out of the text
conversion business. Either your editor handles that, or you hang
appropriate tools on pre-checkin and post-checkout hooks.
--
Robbe
pgpaWp7rhMbAt.pgp
Description: PGP signature