[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Devel] Optimizatins to ttinterp.c
From: |
Werner LEMBERG |
Subject: |
Re: [Devel] Optimizatins to ttinterp.c |
Date: |
Sat, 09 Dec 2000 03:24:06 +0100 (CET) |
Some comparison values from FT2 done with gprof.
> My test was to open "Arial Unicode MS" and TT_Load_Glyph all the
> glyphs so that I could call TT_Get_Glyph_Extents. In doing so, I
> found that RunIns() got called 52,794 times for 13,494 ms. (This
> font has a lot of Kanji and Korean glyphs, many of which are
> composites). I tried loading without hinting, but that made my
> extents inaccurate.
I used the following sample, compiled with gcc 2.95.2, using -O0 to
switch off optimization resp. -O3 to get maximum optimization (the
latter does inlining also for `simple functions', according to the gcc
info pages).
The used font is Arial Unicode MS version 0.84 which probably explains
the different number of glyphs.
======================================================================
#include <freetype/freetype.h>
#include <freetype/ftglyph.h>
int main(void)
{
FT_Library library;
FT_Face face;
FT_Glyph glyph;
FT_BBox bbox, cbox;
FT_UShort i;
(void)FT_Init_FreeType(&library);
(void)FT_New_Face(library, "arialuni.ttf", 0, &face);
(void)FT_Set_Char_Size(face, 0, 16 * 64, 100, 100);
for (i = 0; i < 51180; i++)
{
(void)FT_Load_Glyph(face, i, FT_LOAD_DEFAULT);
(void)FT_Get_Glyph(face->glyph, &glyph);
}
return 0;
}
======================================================================
> Of the functions that RunIns() calls, the major ones were...
>
> function percent calls propagated time
> Calc_Length 17.57% 9,474,392 2,370 ms
> Ins_SHP 14.08 1,014,967 1,899 ms
> Ins_IUP 11.07 103,158 1,493 ms
> Ins_IP 10.04 521,928 1,355 ms
Here the results from gprof for the above test program:
index % time self children called name
[8] 67.4 5.83 7.59 65063 TT_RunIns [8]
0.63 1.68 127692/127692 Ins_IUP [10]
0.97 0.96 1413151/1413151 Ins_SHP [11]
0.63 0.65 715371/715371 Ins_IP [14]
0.48 0.26 722667/722667 Ins_MIRP [16]
Calc_Length() no longer exists; Ins_IUP needs 0.63 + 1.68s, so we have
(5.83+7.59)/(0.63+1.68) = 17.2% of RunIns() for this function, etc.
> The big gain came from TT_MulDiv(). It turns out that all the other
> time soaking functions end up calling it.
Interesting. I get different results. Here the first few entries
of the flat profile for -O0:
% cumulative self self total
time seconds seconds calls us/call us/call name
12.43 6.05 6.05 65063 92.99 592.84 TT_RunIns
10.75 11.28 5.23 4828603 1.08 1.08 Project_x
9.47 15.89 4.61 4473580 1.03 1.03 Project_y
4.46 18.06 2.17 2255569 0.96 0.96 Round_To_Grid
4.25 20.13 2.07 64221 32.23 55.43 TT_Load_Simple_Glyph
3.88 22.02 1.89 1413151 1.34 4.28 Ins_SHP
3.55 23.75 1.73 1677664 1.03 1.03 Direct_Move_X
3.37 25.39 1.64 1540309 1.06 1.06 Direct_Move_Y
3.02 26.86 1.47 6152015 0.24 0.48 FT_MulDiv
2.88 28.26 1.40 715371 1.96 11.02 Ins_IP
2.73 29.59 1.33 3847019 0.35 0.56 Interp
2.63 30.87 1.28 7862798 0.16 0.16 FT_MulFix
2.30 31.99 1.12 722667 1.55 5.83 Ins_MIRP
2.20 33.06 1.07 1416595 0.76 2.78 Compute_Point_Displacement
2.03 34.05 0.99 812915 1.22 1.42 Ins_ENDF
2.03 35.04 0.99 127692 7.75 24.65 Ins_IUP
1.91 35.97 0.93 3369363 0.28 0.28 SkipCode
1.89 36.89 0.92 7518177 0.12 0.12 FT_Get_Char
1.62 37.68 0.79 1606929 0.49 0.49 FT_MulTo64
1.56 38.44 0.76 64221 11.83 625.13 TT_Process_Simple_Glyph
1.44 39.14 0.70 2317733 0.30 0.30 FT_Get_Short
And here the first few entries of the flat profile for -O3:
% cumulative self self total
time seconds seconds calls us/call us/call name
29.28 5.83 5.83 65063 89.61 206.27 TT_RunIns
6.58 7.14 1.31 6049655 0.22 0.31 FT_MulDiv
6.23 8.38 1.24 64221 19.31 36.94 TT_Load_Simple_Glyph
5.73 9.52 1.14 3847019 0.30 0.44 Interp
5.32 10.58 1.06 7862794 0.13 0.13 FT_MulFix
4.87 11.55 0.97 1413151 0.69 1.36 Ins_SHP
3.62 12.27 0.72 7518177 0.10 0.10 FT_Get_Char
3.16 12.90 0.63 715371 0.88 1.79 Ins_IP
3.16 13.53 0.63 127692 4.93 18.07 Ins_IUP
2.66 14.06 0.53 64221 8.25 230.91 TT_Process_Simple_Glyph
2.41 14.54 0.48 722667 0.66 1.03 Ins_MIRP
2.36 15.01 0.47 2317733 0.20 0.20 FT_Get_Short
1.81 15.37 0.36 51180 7.03 356.17 load_truetype_glyph
1.61 15.69 0.32 4828603 0.07 0.07 Project_x
1.51 15.99 0.30 1531719 0.20 0.20 FT_Div64by32
1.36 16.26 0.27 812862 0.33 0.33 Ins_CALL
1.16 16.49 0.23 1563135 0.15 0.15 FT_MulTo64
1.16 16.72 0.23 51180 4.49 12.04 compute_glyph_metrics
1.05 16.93 0.21 51180 4.10 4.10 FT_Outline_Get_CBox
1.00 17.13 0.20 1677664 0.12 0.12 Direct_Move_X
1.00 17.33 0.20 51182 3.91 3.91 TT_Load_Context
The cumulated execution time with -O0 is about 49 seconds; FT_MulDiv()
uses about 2.6%.
The cumulated execution time with -O3 is about 20 seconds; FT_MulDiv()
uses about 6.5%.
I haven't actually tried gcc's `inline' option.
Werner