[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swftools-common] swfstrings and japanese text

From: Con Kolivas
Subject: Re: [Swftools-common] swfstrings and japanese text
Date: Thu, 29 Jun 2006 22:37:42 +1000
User-agent: KMail/1.9.3

On Tuesday 27 June 2006 23:17, Matthias Kramm wrote:
> On Tue, Jun 27, 2006 at 07:11:36PM +1000, Con Kolivas wrote:
> > One query; I can't seem to extract japanese text (kanji) with swfstrings
> > apart from the font name which is correctly displayed in kanji. Most of
> > the static text is ignored and nothing follows.
> That's an interesting feature request :)
> Well, so far swfstrings only extracts text in the standard codepage
> (iso8859-1). There's no UTF-8 output yet.
> I guess I'll add it to the TODO list.
> Do you happen to have any simple Kanji encoded sample-SWFs?

(sample sent offlist)..

I've been looking at your code myself to see if I could help and tracked down 
your output line (in v0.7.0) to

                printf("%c", code);

which is obviously only going to work for ascii codes up to 127 since UTF8 is 
variable length and probably needs a %lc passed a wchar_t. All of this is new 
to me so I'm not sure if it's obvious to others or not who might also find it 
interesting. I've never really hacked on this sort of code before.

I thought you might find this information helpful for UTF8 output:

UTF-8 encoding is variable-length, and characters are encoded with one, two, 
three, or four bytes. The first 128 characters of Unicode (BMP), U+0000 
through U+007F, are encoded with a single byte, and are equivalent to ASCII. 
U+0080 through U+07FF (BMP) are encoded with two bytes, and U+0800 through 
U+FFFF (still BMP) are encoded with three bytes. The 1,048,576 characters of 
the 16 Supplementary Planes are encoded with four bytes.

(from http://www-128.ibm.com/developerworks/java/library/j-u-encode.html)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]