openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] UTF-8


From: Florian Kainz
Subject: Re: [Openexr-devel] UTF-8
Date: Wed, 14 Nov 2012 21:11:36 -0800
User-agent: Thunderbird 2.0.0.24 (X11/20100428)

David Aguilar wrote:
Is it not easier to treat the data like raw bytes and not care?

I'm in favor of UTF-8 as a recommendation.
I'm on the fence about enforcing it in the library (it couldn't hurt).
I am not overly excited about pushing normalization issues into the library.

What's the driving benefit of forcing a particular normalization?

The user used a particular form.  Why not use it as-is?
Presumably the rest of their app uses it too, so leaving data as-is
lets them make the call.

I'm not sure that treating the data as raw bytes and not caring is a
good idea.

Suppose someone hands you an OpenEXR file, and a listing of the header
reveals the following set of channels:

    공룡.R
    공룡.G
    공룡.B
    배경.R
    배경.G
    배경.B

In your image processing application you want to extract the first layer
from the file, so you type 공룡.  However, you don't know - and you
shouldn't have to know - how the text is encoded in the file: Hangul Jamo,
Hangul syllables (pre-composed Jamo) or a combination of both.  In order
to access the correct channel, the name in the file and the name that
was typed in must both be converted into a common, canonical encoding.
Unicode normalization does that.

Similarly, if the file already contains a channel called 배경.R, encoded
using Jamo, then it should not be possible to add another channel with
the name 배경.R, but encoded as syllables.  Code might not have a problem
distinguishing the two channel names, but people certainly would.  The
OpenEXR library should detect an attempt to add two channels with the
same name, and generate an appropriate error message.

The fact that storing a string in a file and retrieving it may change its
encoding should not be a big problem for application code that is aware of
Unicode, since the application must already be able to handle alternate
encodings of a string.

Instead of normalizing strings before they are stored in files, the
OpenEXR library could normalize strings on the fly before every string
comparison.  That way every string would be preserved exactly.  Speed
could be an issue, though.  String comparisons are not rare, and on-the-fly
normalization would slow them down considerably.

Florian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]