Re: _DefaultStringEncoding

discuss-gnustep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: _DefaultStringEncoding

From:	Richard Frith-Macdonald
Subject:	Re: _DefaultStringEncoding
Date:	Sat, 18 Oct 2003 07:20:09 +0100


On Friday, October 17, 2003, at 03:14 PM, Bruno Haible wrote:

Hi,
NSString._DefaultStringEncoding is determined as the value ofGetDefEncoding()
in Unicode.m.

I have three questions about it.

1) Why are the possible values of GNUSTEP_STRING_ENCODING in the
range { "NSISOLatin1StringEncoding", "NSJapaneseEUCStringEncoding",... }
   and not the widely known and standardized names
         { "ISO-8859-1", "EUC-JP", ... }
   ? This makes it needlessly hard for users.

Because the OpenStep standard names are used ... but I agree there isno reasonwhile the names that iconv supports should not be acceptable as well.I've fixed

that.

2) Why does gnustep-base-1.8.0/Documentation/Base.gsdoc say that thevalue
   of GNUSTEP_STRING_ENCODING
       "may be any of the 8-bit encodings supported by your system
        (excluding multi-byte encodings)" ?
I've set it to NSUTF8StringEncoding and the Hello world programdisplays
   its greeting message (in German, non-ASCII of course) just fine.

It's an error ... that restriction used to be there a few years ago,but is no longer

the case.  I've updated the documentation.

3) If GNUSTEP_STRING_ENCODING is not set, why is the default value
   (set in Unicode.m:580) ISO-8859-1? On POSIX systems, all programs
   are expected to interpret file names and file contents according to
   the encoding given by the current locale (nl_langinfo (CODESET)).
   IMO this codeset should be taken and transformed into the GNUstep
   specific equivalent name. I'm using a de_DE.UTF-8 locale and all
   my local files are UTF-8 encoded.


As far as I'm aware ... there is no particular reason why GNUstep should

not be posix compliant as long as it doen't seriously conflict withOpenStepand Apple compatibility. I'd be happy to accept a patch to make thischange

as long as nobody knows good reason not to.

   The situation for URLs is different; for files read from arbitrary
   URLs the following heuristic makes sense:
     - If the contents is valid UTF-8, then assume it is UTF-8.
     - Otherwise assume it is ISO-8859-1.
   The reason why this heuristic works well in practice is that normal
   human-written ISO-8859-1 texts have a ~ 99.8% probability of being
   invalid UTF-8.

[Prev in Thread]

Current Thread

[Next in Thread]

_DefaultStringEncoding, Bruno Haible, 2003/10/17
- Re: _DefaultStringEncoding, Pete French, 2003/10/17
  - Re: _DefaultStringEncoding, Adam Fedor, 2003/10/17
- Re: _DefaultStringEncoding, Richard Frith-Macdonald <=
  - Re: _DefaultStringEncoding, Bruno Haible, 2003/10/20

Prev by Date: Re: [Suggestion] GNUstep-test for quality control (WAS: Re: deferreddeallocation)
Next by Date: Re: deferred deallocation of local objects
Previous by thread: Re: _DefaultStringEncoding
Next by thread: Re: _DefaultStringEncoding
Index(es):
- Date
- Thread