[Monotone-devel] Re: problems with i18n testsuite

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: problems with i18n testsuite

From:	graydon hoare
Subject:	[Monotone-devel] Re: problems with i18n testsuite
Date:	Fri, 23 Apr 2004 12:03:56 -0400
User-agent:	Mozilla Thunderbird 0.5 (X11/20040208)

Robert Bihlmeyer wrote:

Basically, I want to make the point: the LC_CTYPE of your shell need
not match the charset of all your filenames, or the charset of all
your files' contents. And there is no other way to infer a "local
charset".

ok. currently monotone treats LC_CTYPE as the charset for file *names*,but by default does no conversion of the file's internal bytes.

I'm still unclear on what you do with file content. Do you convert
from whatever you assume as the local charset to UTF-8 for storage and
hash computation? Wouldn't that fail horribly for non-text content?

no, we don't convert the bytes inside files by default. we provide aplace for users to specify a conversion if they want one to happen, butby default that conversion is empty. the only conversion we do bydefault is manifest pathname <-> filesystem pathname, and that is textcontent (very regular text content, in fact).

I'd really like version control systems to get out of the text
conversion business. Either your editor handles that, or you hang
appropriate tools on pre-checkin and post-checkout hooks.

I mostly agree with you here. as I said, mostly we punt this issue tohooks and try not to enforce any specific conversions. as of the win32branch -- where I noticed we were doing it wrong -- files are alwaysopened in binary mode and there's no converting.

the only thing we need to be sure about wrt. pathnames is that we musthave UTF-8 in the files monotone interprets the content of (MT/manifest,MT/work, .mt-attrs). monotone takes those files apart and evaluatesthem. it reads their contents, semantically. it needs to be able tomatch regexes against the bytes it finds in a manifest. we'd have to doa lot more contortions if these control files could be in non-UTF-8charsets.

but that's really all we need. the decision to externalize those pathnames in the LC_CTYPE charset is just a convenience for mapping to andfrom UTF-8, when in an environment which doesn't understand it. theconvention is certainly not cast in stone. if you prefer we can make itoverridable by a hook, or even default to a hook which normally returnsUTF-8 too. I just want people with non-UTF-8 "legacy" systems to besomewhat comfortable, and was under the impression that LC_CTYPE wouldusually hold their preferred representation.


-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] problems with i18n testsuite, Martin Waitz, 2004/04/18
- [Monotone-devel] Re: problems with i18n testsuite, graydon hoare, 2004/04/18
  - [Monotone-devel] Re: problems with i18n testsuite, Robert Bihlmeyer, 2004/04/19
    - [Monotone-devel] Re: problems with i18n testsuite, graydon hoare, 2004/04/20
    - [Monotone-devel] Re: problems with i18n testsuite, Robert Bihlmeyer, 2004/04/22
    - [Monotone-devel] Re: problems with i18n testsuite, graydon hoare <=

Prev by Date: [Monotone-devel] away for a week
Next by Date: [Monotone-devel] Christof Petig's Public Key?
Previous by thread: [Monotone-devel] Re: problems with i18n testsuite
Next by thread: [Monotone-devel] [patch] use system libidn
Index(es):
- Date
- Thread