discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

_DefaultStringEncoding


From: Bruno Haible
Subject: _DefaultStringEncoding
Date: Fri, 17 Oct 2003 16:14:07 +0200
User-agent: KMail/1.5

Hi,

NSString._DefaultStringEncoding is determined as the value of GetDefEncoding()
in Unicode.m.

I have three questions about it.

1) Why are the possible values of GNUSTEP_STRING_ENCODING in the
   range { "NSISOLatin1StringEncoding", "NSJapaneseEUCStringEncoding", ... }
   and not the widely known and standardized names
         { "ISO-8859-1", "EUC-JP", ... }
   ? This makes it needlessly hard for users.

2) Why does gnustep-base-1.8.0/Documentation/Base.gsdoc say that the value
   of GNUSTEP_STRING_ENCODING
       "may be any of the 8-bit encodings supported by your system
        (excluding multi-byte encodings)" ?
   I've set it to NSUTF8StringEncoding and the Hello world program displays
   its greeting message (in German, non-ASCII of course) just fine.

3) If GNUSTEP_STRING_ENCODING is not set, why is the default value
   (set in Unicode.m:580) ISO-8859-1? On POSIX systems, all programs
   are expected to interpret file names and file contents according to
   the encoding given by the current locale (nl_langinfo (CODESET)).
   IMO this codeset should be taken and transformed into the GNUstep
   specific equivalent name. I'm using a de_DE.UTF-8 locale and all
   my local files are UTF-8 encoded.

   The situation for URLs is different; for files read from arbitrary
   URLs the following heuristic makes sense:
     - If the contents is valid UTF-8, then assume it is UTF-8.
     - Otherwise assume it is ISO-8859-1.
   The reason why this heuristic works well in practice is that normal
   human-written ISO-8859-1 texts have a ~ 99.8% probability of being
   invalid UTF-8.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]