Hi!
I'm a developer of a mod for Civilization IV. We have used iconv tables to open the game to new languages. To do this, the game now reads UTF-8 in xml files and convert those to Windows codepages, and thus, allows the game to run in all 1 byte languages.
Now, we're are implementing asian languages support but i'm quite lost with the conversion function. The asian characters are oftenly coded on 3 bytes, and the game will read each byte as a char. I'm trying to adapt the iconv function to gather the three chars (to make a wchar) and guess the asian char.
Let's take an example with a korean string: 아브라함 (UTF8).
The game reads ì•„ë¸Œë ¼í•¨.
The following adresses are:
아 : EC 95 84
브 : EB B8 8C
라 : EB 9D BC
함 : ED 95 A8
The string is read as follow: ì•„ë¸Œë ¼í•¨
ì : EC
• : 95
„ : 84
ë : EB
¸ : B8
Œ : 8C
ë : EB
9D (unprintable)
¼ : BC
í : ED
• : 95
¨ : A8
Now, here is the original iconv function:
static int
cp949_mbtowc (conv_t conv, ucs4_t *pwc, const unsigned char *s, int n)
{
unsigned char c = *s;
/* Code set 0 (ASCII) */
if (c < 0x80)
return ascii_mbtowc(conv,pwc,s,n);
/* UHC part 1 */
if (c >= 0x81 && c <= 0xa0)
return uhc_1_mbtowc(conv,pwc,s,n);
if (c >= 0xa1 && c < 0xff) {
if (n < 2)
return RET_TOOFEW(0);
{
unsigned char c2 = s[1];
if (c2 < 0xa1)
/* UHC part 2 */
return uhc_2_mbtowc(conv,pwc,s,n);
else if (c2 < 0xff && !(c == 0xa2 && c2 == 0xe8)) {
/* Code set 1 (KS C 5601-1992, now KS X 1001:1998) */
unsigned char buf[2];
int ret;
buf[0] = c-0x80; buf[1] = c2-0x80;
ret = ksc5601_mbtowc(conv,pwc,buf,2);
if (ret != RET_ILSEQ)
return ret;
/* User-defined characters */
if (c == 0xc9) {
*pwc = 0xe000 + (c2 - 0xa1);
return 2;
}
if (c == 0xfe) {
*pwc = 0xe05e + (c2 - 0xa1);
return 2;
}
}
}
}
return RET_ILSEQ;
}
It only expects 2 chars/bytes and we have 3, so i don't understand what should i do to process the multibytes into a wchar. For the example, EC 95 84, how to handle the conversion ? The first byte is still important as EB 95 84, ED 95 84, are also characters...
Thank you for your help,
Hadrien.