discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: plists in UTF8


From: David Ayers
Subject: Re: plists in UTF8
Date: Wed, 14 Jun 2006 14:12:23 +0200
User-agent: Mozilla Thunderbird 1.0.2 (X11/20060423)

Richard Frith-Macdonald schrieb:
> 
> On 14 Jun 2006, at 11:43, David Wetzel wrote:
> 
>> Hi folks,
>>
>> plparse does not work with plists that contain UTF8 Cyrillic chars.
>>
>> Property List Editor.app on Mac OS X does.
>>
>> File says: ru.plist: UTF-8 Unicode C program text
>>
>> May we change this behaviour?
> 
> 
> Well is it a bug? ... plparse is intended to provide a check that a 
> file contains a valid property list ... but it could easily be the  case
> that 'Property List Editor.app' will edit invalid property lists  (fault
> tolerance makes sense in an editor b ut not in a checker) ...  so what
> you probably need to determine  is if there is a bug in plparse.
> 
> A valid property list may ...
> 
> 1. Be ASCII data (with \U escapes for unicode)
> 2. Be UTF-16 with a leading BOM to identify it
> 3. Be UTF-8 with a leading BOM to identify it
> 
> I guess in theory an XML property list could also specify its  character
> encoding in the header but we don't have support for that.
> 
> Anything else is invalid ... because it's non-portable and the  meaning
> of the data in the file would change if you opened the file  using
> another locale.
> 
> I guess if you want plparse to accept non-portable files (ie guess  that
> the encoding is that of the current locale), you could provide a  patch
> to add a command-line option to get it to do that.
> eg. plparse -PermitNonPortable YES filename
> 
> I don't think that would cause problems for anyone.

The issue is whether a UTF-8 plist without a BOM is a valid plist (i.e.
should be considered non-portable).

I've often read that BOM's in UTF-8 files cause issues (e.g.:
http://en.wikipedia.org/wiki/Byte_Order_Mark).  It becomes a problem
when multiple text files are concatenated and someone (I think it was
you) told me that BOM's within files have been deprecated.  (I wonder if
cat(1) or it underlying facilities would be patched to handle this).

I think that one could argue that a plain UTF-8 file should be
considered valid/portable by plparse...  But for that to be of any value
would also mean, that UTF-8 files would be parsed correctly in non-UTF-8
locales, which I suppose is the reason that UTF-8 without BOM is
currently considered non-portable.

Cheers,
David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]