[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Endianness-specific
From: |
Bruno Haible |
Subject: |
Re: Endianness-specific |
Date: |
Sat, 6 Oct 2007 20:22:18 +0200 |
User-agent: |
KMail/1.5.4 |
Hi Ludovic,
> I'm trying to implement functions that convert a string in the current
> locale encoding to its UTF-{16,32} representation, for a given
> endianness.
This kind of task is outside of the scope of the uniconv/* modules.
'unistr' and 'uniconv' deal wih UTF-{8,16,32} as an internal representation
of strings in memory; therefore they assume machine-dependent endianness
and alignment - and therefore can access every unit in a single memory
access.
If the endianness or alignment is different, the code needs to access
every unit byte after byte; this is not the way it's done in the 'unistr'
and 'uniconv' libraries.
Therefore I would recommend to use the mem_cd_iconveh function from the
'striconveh' module, with FROMCODE = locale_charset() and TOCODE =
"UTF-16BE" or "UTF-16LE" (or vice versa). Or mem_iconveh you don't
want to reuse the conversion descriptors.
The str_cd_iconveh and str_iconveh functions are not usable here because they
look for the end of string via strlen().
I recommend the 'striconveh' module here over the 'striconv' module, because
it will work even with Solaris iconv() which can convert from anything to
UTF-8 and vice versa, but cannot convert directly e.g. between ISO-8859-2
and UTF-16LE. The 'striconveh' module does the conversion in two steps in
such a case.
Bruno