freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Devel] Fw: Freetype, fontconfig,Xft, Mozilla and Non-BMP char. supp


From: Antoine Leca
Subject: Re: [Devel] Fw: Freetype, fontconfig,Xft, Mozilla and Non-BMP char. support
Date: Tue, 03 Dec 2002 01:43:01 +0100
User-agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.0) Gecko/20020530

Re-bonjour:

I was writing some bullshits yesterday. In fact, I was performing easy
tests, which worked quite well, and I failed to notice a "break" inside
a for loop... How embarrassing. When I performed more indeep tests,
they proved to fail "misteriously", so I had to dig a bit more, and
while digging I now understand that I was initially wrong, and the "good"
behaviour I enjoyed was in fact a mere mirage.
OK, now I am more or less correct, I think, so I believe we can try to
sort the things out.

Antoine Leca wrote on 2002-12-02 00:10 +0100:
>  > On Thu, 28 Nov 2002, Jungshik Shin wrote:
>  >
>  >  When FT_Select_CharMap() is called with 'FT_ENCODING_UNICODE'(or
>  > deprecated ft_encoding_unicode), freetype activates the first cmap with
>  > Unicode encoding for subsequent operations on a font until another cmap
>  > is activated. It's not a problem for fonts covering BMP only. However,
>  > fonts like Code2001 has multiple Cmaps all with the identical symbolic
>  > FT encoding 'FT_ENCODING_UNICODE' but with different char. coverage.
>  > Code2001 has 4 cmaps, pid=0,eid=0(Unicode), pid=1,eid=1(AppleRoman),
>  > pid=3(MS),eid=1(Unicode) and  pid=3(MS),eid=10(Unicode).  Only the last
>  > cmap has non-BMP characters although the first and the third are also
>  > Unicode cmap. They're actually UCS-2 cmap.
>
> You are correct.

And until now, I was OK.


>  > As mentioned above, Freetyp2
>  > makes the first cmap matching 'symbolic encoding name' active and
>  > unfortunately that happens to be the one not covering non-BMP
>  > characters.
>
> Well, in fact only FT_Select_CharMap() does that.

This is not exactly correct.


> The initialisation code
> (in open_face, around line 770) does the right thing, and scans *all* the
> available charmaps before returning.

Now, this is utterly wrong! In fact, the code in open_face() is as
buggy as the one in FT_Select_CharMap(), even if they are slighty
different (and since they do more or less the same thing, something
is not perfect here.)


> Since the 3,10 will alwayus be the last one, it would be selected by default.

Note: the sfnt specification says that the tables should always be
encoded in increasing order. As far as I know, all font files respect
this, even if the Freetype code does not use this property to do e.g.
binary search (which would probably be wasteful.) But this "feature"
was the one that induced me wrong. After a bit of thinking, I believe
we should in no way depend on such behaviour.


>  >   One possible solution is to return not the first
>  > cmap table matching the symbolic encoding name of 'FT_ENCODING_UNICODE'
>  > but to keep on looking to see if pid=3/eid=10 cmap is also present.
>  > If it is, it has to be activated instead of the first Unicode cmap
> found.
>
> And this is what open_face already does! So we have to correct this.

Even if in fact open_face() did not do it, this is the desirable behaviour.
So we should tend to enforce this behaviour.

The problem is that UCS-4 is coded either with 0,4 (Apple way), or
3,10 (MS way), or perhaps other ways in the future (Adobe extension
to one of its format, leading to a 7,x version?). The place where
the assignments of the FT_ENCODING are done is in src/xxx/xxobjs.c,
i.e. right inside the modules :-(. I do not want to break all the
modules compatibility (but this is perhaps an option), so at the
moment I will go with Shin and will hard-code inside src/base/objs.c
the selection of the UCS-4 map.

By the way, I notice that we synthetise a 3,1 map for those format
that do not have native Unicode maps. However, some formats, and
particularly the Adobe Glyph List, now allows for non-BMP character
names (like u1033D for the Gothic char NAUTHS which is my usual
suspect), so this is an area that should be enhanced/revised, too.



>  >  I believe Werner is on this list so that I won't write to him
>  > separately for a while. Werner, if you find that my patch makes sense,
>  > it'd be nice to apply it to Freetype2. BTW, it just occurred to me that
>  > the routine setting the default Cmap for a newly opened FT_Face has to
>  > be modified in a similar manner. (currently, it sets the first-found
>  > Unicode Cmap as the default, but the first-matched Unicode Cmap may not
>  > be the most extensive one as I explained above.)

Well spotted.


> No it does not, as I explained above.

And again, utterly bullshit from me. :-(.


Well, while checking for my code, I noticed some areas that need
improvements for non-BMP characters. As I already wrote, the synthetised
charmaps need to be revised, in order to create a 3,10 charmap instead of
a 3,1 when the PSnames have some characters outside the BMP.
On a minor way, ftview need to be enhanced: presently the character set
is limited to U+FFFF. I will enhance it later this week.
I have enhanced ftstring: now, the "message" could be encoded in UTF-8.
This means that one can look at the Gothic characters of Code2001,
without requiring a full download of Mozilla (very nice when like me,
you are behind a 56K modem and a paying phone line... ;-) ).

OK, I have updated the CVS, please check. The (relevant) changes are all
in src/base/ftobjs.c, if you mind. Please note that I did not check if
it works with Linux (nor even gcc), only a short test on Win32.

David, Werner, I will appreciate a code review, since this is about my
first comit to the Freetype 2 core. Thanks in advance for your comments.

And many thanks to Jungshik Shin to draw our attention on this point.


Antoine




reply via email to

[Prev in Thread] Current Thread [Next in Thread]