discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: _DefaultStringEncoding


From: Richard Frith-Macdonald
Subject: Re: _DefaultStringEncoding
Date: Sat, 18 Oct 2003 07:20:09 +0100


On Friday, October 17, 2003, at 03:14 PM, Bruno Haible wrote:

Hi,

NSString._DefaultStringEncoding is determined as the value of GetDefEncoding()
in Unicode.m.

I have three questions about it.

1) Why are the possible values of GNUSTEP_STRING_ENCODING in the
range { "NSISOLatin1StringEncoding", "NSJapaneseEUCStringEncoding", ... }
   and not the widely known and standardized names
         { "ISO-8859-1", "EUC-JP", ... }
   ? This makes it needlessly hard for users.

Because the OpenStep standard names are used ... but I agree there is no reason while the names that iconv supports should not be acceptable as well. I've fixed
that.

2) Why does gnustep-base-1.8.0/Documentation/Base.gsdoc say that the value
   of GNUSTEP_STRING_ENCODING
       "may be any of the 8-bit encodings supported by your system
        (excluding multi-byte encodings)" ?
I've set it to NSUTF8StringEncoding and the Hello world program displays
   its greeting message (in German, non-ASCII of course) just fine.

It's an error ... that restriction used to be there a few years ago, but is no longer
the case.  I've updated the documentation.

3) If GNUSTEP_STRING_ENCODING is not set, why is the default value
   (set in Unicode.m:580) ISO-8859-1? On POSIX systems, all programs
   are expected to interpret file names and file contents according to
   the encoding given by the current locale (nl_langinfo (CODESET)).
   IMO this codeset should be taken and transformed into the GNUstep
   specific equivalent name. I'm using a de_DE.UTF-8 locale and all
   my local files are UTF-8 encoded.

As far as I'm aware ... there is no particular reason why GNUstep should
not be posix compliant as long as it doen't seriously conflict with OpenStep and Apple compatibility. I'd be happy to accept a patch to make this change
as long as nobody knows good reason not to.

   The situation for URLs is different; for files read from arbitrary
   URLs the following heuristic makes sense:
     - If the contents is valid UTF-8, then assume it is UTF-8.
     - Otherwise assume it is ISO-8859-1.
   The reason why this heuristic works well in practice is that normal
   human-written ISO-8859-1 texts have a ~ 99.8% probability of being
   invalid UTF-8.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]