Re: Endianness-specific

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Endianness-specific

From:	Bruno Haible
Subject:	Re: Endianness-specific
Date:	Sat, 6 Oct 2007 20:22:18 +0200
User-agent:	KMail/1.5.4

Hi Ludovic,

> I'm trying to implement functions that convert a string in the current
> locale encoding to its UTF-{16,32} representation, for a given
> endianness.

This kind of task is outside of the scope of the uniconv/* modules.
'unistr' and 'uniconv' deal wih UTF-{8,16,32} as an internal representation
of strings in memory; therefore they assume machine-dependent endianness
and alignment - and therefore can access every unit in a single memory
access.

If the endianness or alignment is different, the code needs to access
every unit byte after byte; this is not the way it's done in the 'unistr'
and 'uniconv' libraries.

Therefore I would recommend to use the mem_cd_iconveh function from the
'striconveh' module, with FROMCODE = locale_charset() and TOCODE =
"UTF-16BE" or "UTF-16LE" (or vice versa). Or mem_iconveh you don't
want to reuse the conversion descriptors.

The str_cd_iconveh and str_iconveh functions are not usable here because they
look for the end of string via strlen().

I recommend the 'striconveh' module here over the 'striconv' module, because
it will work even with Solaris iconv() which can convert from anything to
UTF-8 and vice versa, but cannot convert directly e.g. between ISO-8859-2
and UTF-16LE. The 'striconveh' module does the conversion in two steps in
such a case.

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

Endianness-specific, Ludovic Courtès, 2007/10/06
- Re: Endianness-specific, Bruno Haible <=
  - Endianness-aware UTF conversion, Ludovic Courtès, 2007/10/07
    - new module iconv_open-utf (was: Re: Endianness-aware UTF conversion), Bruno Haible, 2007/10/14
    - Re: new module iconv_open-utf, Ludovic Courtès, 2007/10/15

Prev by Date: Endianness-specific
Next by Date: new modules 'open', 'fopen', 'freopen'
Previous by thread: Endianness-specific
Next by thread: Endianness-aware UTF conversion
Index(es):
- Date
- Thread