freetype
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Codepage?


From: Antoine Leca
Subject: Re: Codepage?
Date: Tue, 02 May 2000 16:49:17 +0200

Rob Kramer wrote:
> 
> > So in fact, the text is 8-bit encoded.
> >
> > So you need a way to convert from the encoding used in Windows Word
> > (I assume you, or your user, knows what it is) to transform to
> > character indices as stored in the font (and then to glyph indices
> > using some TT_CharToIndex function).
> > Do I still get it right so far?
> 
> Correct, but I or the user don't know the encoding I guess..

Ah! now I get the picture. What is really lacking is the information
of the encoding used (what you call the codepage is a IBM/MS way to
name just that, in a convenient way for computers that *love* to
used numbers everywhere ;-)).


> I mean, the way
> we did Thai text was by using a Thai keyboard on a normal English Win95
> installation. Somehow the keyboard produced the proper codes to match the
> font (and as far as I could see, that font only had a 'MS symbol' map.

Yes, becasue that is the way Win 95 keyboards works: they generate
the (8-bit) proper codes ready to be ingested by the application, using
the default codepage associated with every language (in this case, 874,
a.k.a. TIS 820).

 
> Do you say that if I want to display Russian, my Windows (or word?) should
> be in 'Russian mode',

Yes.

> and my software should too?

Yes.

> That was what I was trying to do by having the user specify a codepage..

O.K. So what you need is to learn about the codepage used to encode
a text. If you are under Word, that is a simple query of the locale
of the current keyboard layout (we are going off-topic, I keep it short);
if the text is persistant (already typed), then the only way to know 
is to get the codepage associated with the font.
In Word format, this information is encoded with a byte (named charset)
that is associated with every font. 0 is Latin-1 (1252), 0xEE means East
European Windows, 0xCC means Cyrillic, 222 = 0xDE (IIRC) means Thai.

However, the same information is not really encoded in the font (because
a font can be remapped to cover various encodings)...

 
> Can't I get an application like Word to output Unicode?

Not easily :-(

 
> > Depending of your platform, the job is more than probably already done,
> > but the particular solution you have to use (iconv, mbs[r]towcs,
> > MultiByteToWideChar, recode, ...) is dedicated.
> 
> Are these applications or calls in some library?

Library calls (except recode). That makes them easier to use in your case :-)

iconv is nearly standard on Unix platforms

MultiByteToWideChar is the Microsoft's counterpart

mbs[r]towcs are standard C, but you should verify yourself that
the output (the "wc" end) is really Unicode: a lot of library
perform an awful job here: when it performs well, that is the
mightiest...


Hope it helps,
Antoine



reply via email to

[Prev in Thread] Current Thread [Next in Thread]