Here's a small set of patches that slightly improve the performance of FreeType when compiled for ARM and x86_64 with GCC. I also checked that it doesn't negatively affect x86 performance.
On ARM, loading glyphs is about 3% faster, and rendering gray bitmaps is 6% faster.
On x86_64, loading glyphs is 6% faster, and rendering gray bitmaps is 2.5% faster.
The optimizations were found by inspecting the generated machine code in hot spots.
Let me know if you find any issue.
./ftbench -p -t 5 -s 14 -f 0008 Arial.ttf
(0008 is FT_LOAD_NO_BITMAP)
ARM:
====
CFLAGS="-O2 -fomit-frame-pointer -march=armv7-a -mthumb" ./configure --disable-shared --without-zlib --without-png --without-bzip2 --host=arm-linux-androideabi
Before:
Load 34.287 us/op
Load_Advances (Normal) 34.317 us/op
Load_Advances (Fast) 0.176 us/op
Render 23.544 us/op
Get_Glyph 6.661 us/op
Get_CBox 1.957 us/op
Get_Char_Index 0.261 us/op
Iterate CMap 121.696 us/op
New_Face 115.143 us/op
Embolden 1.428 us/op
Get_BBox 3.313 us/op
After:
Load 33.358 us/op
Load_Advances (Normal) 33.330 us/op
Load_Advances (Fast) 0.176 us/op
Render 22.079 us/op
Get_Glyph 6.494 us/op
Get_CBox 1.937 us/op
Get_Char_Index 0.232 us/op
Iterate CMap 120.793 us/op
New_Face 115.759 us/op
Embolden 1.450 us/op
Get_BBox 3.384 us/op
x86_64:
=======
CFLAGS="-O2 -fomit-frame-pointer" ./configure --disable-shared --without-zlib --without-png --without-bzip2
Before:
Load 4.890 us/op
Load_Advances (Normal) 4.849 us/op
Load_Advances (Fast) 0.027 us/op
Render 2.813 us/op
Get_Glyph 0.473 us/op
Get_CBox 0.076 us/op
Get_Char_Index 0.024 us/op
Iterate CMap 13.982 us/op
New_Face 12.341 us/op
Embolden 0.027 us/op
Get_BBox 0.303 us/op
After:
Load 4.617 us/op
Load_Advances (Normal) 4.537 us/op
Load_Advances (Fast) 0.028 us/op
Render 2.743 us/op
Get_Glyph 0.441 us/op
Get_CBox 0.076 us/op
Get_Char_Index 0.023 us/op
Iterate CMap 13.508 us/op
New_Face 12.298 us/op
Embolden 0.027 us/op
Get_BBox 0.296 us/op
x86:
====
CFLAGS="-O2 -fomit-frame-pointer -m32" LDFLAGS="-m32" ./configure --disable-shared --without-zlib --without-png --without-bzip2
Before:
Load 4.973 us/op
Load_Advances (Normal) 4.910 us/op
Load_Advances (Fast) 0.023 us/op
Render 3.140 us/op
Get_Glyph 0.641 us/op
Get_CBox 0.243 us/op
Get_Char_Index 0.027 us/op
Iterate CMap 15.303 us/op
New_Face 13.041 us/op
Embolden 0.167 us/op
Get_BBox 0.527 us/op
After:
Load 4.930 us/op
Load_Advances (Normal) 4.895 us/op
Load_Advances (Fast) 0.023 us/op
Render 3.131 us/op
Get_Glyph 0.620 us/op
Get_CBox 0.237 us/op
Get_Char_Index 0.027 us/op
Iterate CMap 15.051 us/op
New_Face 13.133 us/op
Embolden 0.163 us/op
Get_BBox 0.524 us/op