[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Openexr-devel] UTF-8
From: |
David Aguilar |
Subject: |
Re: [Openexr-devel] UTF-8 |
Date: |
Wed, 14 Nov 2012 22:35:53 -0800 |
Thanks for the detailed explanation.
On Wed, Nov 14, 2012 at 9:11 PM, Florian Kainz <address@hidden> wrote:
> David Aguilar wrote:
>>
>> Is it not easier to treat the data like raw bytes and not care?
>>
>> I'm in favor of UTF-8 as a recommendation.
>> I'm on the fence about enforcing it in the library (it couldn't hurt).
>> I am not overly excited about pushing normalization issues into the
>> library.
>>
>> What's the driving benefit of forcing a particular normalization?
>>
>> The user used a particular form. Why not use it as-is?
>> Presumably the rest of their app uses it too, so leaving data as-is
>> lets them make the call.
>
>
> I'm not sure that treating the data as raw bytes and not caring is a
> good idea.
>
> Suppose someone hands you an OpenEXR file, and a listing of the header
> reveals the following set of channels:
>
> 공룡.R
> 공룡.G
> 공룡.B
> 배경.R
> 배경.G
> 배경.B
>
> In your image processing application you want to extract the first layer
> from the file, so you type 공룡. However, you don't know - and you
> shouldn't have to know - how the text is encoded in the file: Hangul Jamo,
> Hangul syllables (pre-composed Jamo) or a combination of both. In order
> to access the correct channel, the name in the file and the name that
> was typed in must both be converted into a common, canonical encoding.
> Unicode normalization does that.
That makes sense. This is probably the most common use case,
so I see how it helps here. In lieu of an encoding header,
one form must be chosen, so it's best to go with one.
I just wanted to illustrate one tiny use case where not doing
auto-normalization could be helpful.
Just thinking out loud --
Auto-normalization definitely makes sense for channel
and header names. Are there any use cases for raw
const char * storage? Header values?
> Similarly, if the file already contains a channel called 배경.R, encoded
> using Jamo, then it should not be possible to add another channel with
> the name 배경.R, but encoded as syllables. Code might not have a problem
> distinguishing the two channel names, but people certainly would. The
> OpenEXR library should detect an attempt to add two channels with the
> same name, and generate an appropriate error message.
>
> The fact that storing a string in a file and retrieving it may change its
> encoding should not be a big problem for application code that is aware of
> Unicode, since the application must already be able to handle alternate
> encodings of a string.
>
> Instead of normalizing strings before they are stored in files, the
> OpenEXR library could normalize strings on the fly before every string
> comparison. That way every string would be preserved exactly. Speed
> could be an issue, though. String comparisons are not rare, and on-the-fly
> normalization would slow them down considerably.
--
David
- [Openexr-devel] UTF-8, Brendan Bolles, 2012/11/13
- [Openexr-devel] UTF-8, Hồ Châu, 2012/11/14
- Re: [Openexr-devel] UTF-8, Florian Kainz, 2012/11/14
- Re: [Openexr-devel] UTF-8, David Aguilar, 2012/11/14
- Re: [Openexr-devel] UTF-8, Florian Kainz, 2012/11/14
- Re: [Openexr-devel] UTF-8, David Aguilar, 2012/11/14
- Re: [Openexr-devel] UTF-8, Florian Kainz, 2012/11/15
- Re: [Openexr-devel] UTF-8,
David Aguilar <=
- Re: [Openexr-devel] UTF-8, Jim Atkinson, 2012/11/15
- Re: [Openexr-devel] UTF-8, Florian Kainz, 2012/11/15
- Re: [Openexr-devel] UTF-8, Jim Atkinson, 2012/11/15
- Re: [Openexr-devel] UTF-8, Florian Kainz, 2012/11/15
- Re: [Openexr-devel] UTF-8, Jim Atkinson, 2012/11/16
- Re: [Openexr-devel] UTF-8, Larry Gritz, 2012/11/16
- Re: [Openexr-devel] UTF-8, Britton, Andrew D, 2012/11/16
- Re: [Openexr-devel] UTF-8, Brendan Bolles, 2012/11/15
- Message not available
- Re: [Openexr-devel] UTF-8, Brendan Bolles, 2012/11/15