openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] UNICODE support in openexr file I/O


From: Florian Kainz
Subject: Re: [Openexr-devel] UNICODE support in openexr file I/O
Date: Thu, 19 Jan 2006 17:14:00 -0800
User-agent: Mozilla Thunderbird 1.0 (X11/20041207)


A "traditional" application that assumes 8-bit char strings are
encoded according to ISO 8859-1 instead of UTF-8 would mis-interpret
"กรุงเทพมหานคร" as "à¸<81>รุงเทพมหานคร",
but the application probably wouldn't crash.  ASCII text would be
handled correctly; the ASCII, ISO 8859 and UTF-8 encodings of the
ASCII character set are the same.

Still, automatic conversion between wchar_t strings and UTF-8-
encoded char strings would be useful.

And, as Drew suggested, maybe we should specify that all strings
are UTF-8-encoded, although this would create a compatibility
issue with existing files that contain 8-bit ISO 8859 strings.
ISO 8859 is commonly used in Europe and as far as I know "twelve"
is the same in UTF-8 and ISO 8859-1, but "zwölf" is not.

Florian

Bob Friesenhahn wrote:
On Thu, 19 Jan 2006, Florian Kainz wrote:

OpenEXR uses strings in three places:

- File names
- Attribute names, for example, "displayWindow" or "pixelAspectRatio".
- Attribute values, for example the channel names in the channels attribute.

String processing operations performed by the IlmImf library include:

- copying
- comparing and sorting, in order to find attributes or channels by name
- concatenation, usually to generate error messages such as
 "Cannot open file foo.exr (Permission denied)."

As far as I know, all of those operations work the same way for regular char strings and for UTF-8-encoded Unicode. (Character counting or sub-string extraction would have to know about UTF-8, but we don't do that in IlmImf.)

This is all good, but there is one sticky issue. If UTF-8 is now stored in OpenEXR files, then OpenEXR may return a UTF-8 string to an application which is not designed for UTF-8 (e.g. traditional "Unix" application) and the application may misbehave. Is this a possible scenario? If this is a possible scenario, then perhaps new interfaces should be added to support UTF-8, and the legacy interfaces should behave as closely as possible to before (by transforming to the native character set if possible).

Bob
======================================
Bob Friesenhahn
address@hidden, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/






reply via email to

[Prev in Thread] Current Thread [Next in Thread]