[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Devel] Advice on Font Subsetting
From: |
Salman Khilji |
Subject: |
[Devel] Advice on Font Subsetting |
Date: |
Wed, 24 Mar 2004 18:49:48 -0800 |
User-agent: |
KMail/1.5.1 |
Okay. So I have equipped myself with knowledge on how to subset Type1 fonts
for embedding in PDF. For this, I consulted the dvipdfmx project.
Unfortunately, I don't want to use the code from dvipdfmx because the code is
1) dependent on kpathsea, and 2) the code relies on the existence of a
corresponding tfm file---some metrics are read from the tfm file instead of
the afm file. This would make my project dependent on an installed
TeX distribution---which is what I don't want. Moreover, FreeType in my
opinion is much easier to read than dvipdfmx.
So I want to use basically the same logic to create a Type1 subsetted font
using FreeType. I familiarized myself with how FT parses Type 1 fonts and
would like some advice please.
1) First of all, we need an array that stores the indices of the used
glyphs. We can use something like:
FT_Byte * used_glyphs = calloc( sizeof(FT_Byte), face->num_glyphs );
Then we use the FT_Get_Char_Index() repeatedly on a text script to get the
glyph index corresponding to a charcode. We can then mark the used glyphs in
the used_glyphs array. Anything that is 0 in the used_glyphs can be thrown
out.
2) Then basically you start reading the pfb file and start copying it into
another buffer as is. dvipdfmx uses a memory based buffer for this purpose.
Shall I used a file based buffer? pfb files can be potentially large so I am
concerned about wasting memory here. Though file based buffer requires
creation of a temp file, which the client has to read back. Which one shall
I use?
3) The parser needs modification. We have functions like:
T1_Skip_PS_Token() and T1_Skip_Spaces(). These functions increment the
cursor from its current location. Functions that read an integer token seem
to consume the current token and increment the cursor. I would have to
modify it so that instead of throwing away the cursor's location, I would
have to store the contents in the buffer from Step 2).
Right now I am thinking that I might need mods like this:
cur_before = cur;
T1_Skip_PS_Token( parser );
cur_after = cur;
Copy_Into_Buffer( buffer, cur_after - cur_before );
However, I don't feel like throwing all these subsetting related logic
throughout the parser. Any suggestions from the FreeType gurus? Maybe we
need to create a specialized parser for subsetting (say T1_ParserEmbed). It
could have different functions for parsing that would store the input buffer
and do other stuff.
4) If I encounter an /Encoding entry that happens to be an array, then I
must throw out all the glyphs that are not being used. I will need access
to the used_glpys array for this purpose? Where shall we store the
used_glyphs array? Shall I modify the FT_FaceRec_ struct?
5) The private dictionary needs to be decrypted. FT currently does this.
After decryption, we throw away the glyphs that we don't need. Everything
else in the private dictionary is copied verbatim. We then need to reencrypt
the whole thing. dvipdfmx does this. I will need to add t1_enrcypt function
from the dvipdfmx project. This is basically just a few lines of code.
6) After the private dictionary, we copy everything verbatim to the buffer.
The hardest part is the parser of course. Another approach would be to
create a new parser that is a specialization of the current one. The new
parser's methods like skip_spaces can be modified to copy into the new
buffer rather than simply skipping the input. I definitely feel like I can
use some suggestions here from the gurus.
7) Would it make sense for me to contribute the code back and try to get it
into official FreeType distribution? Is there any interest?
Salman
- [Devel] Advice on Font Subsetting,
Salman Khilji <=