[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Devel] Benchmarking FreeType
From: |
David Turner |
Subject: |
Re: [Devel] Benchmarking FreeType |
Date: |
Sat, 08 Jun 2002 03:41:41 +0200 |
Salut,
Vincent Caron a écrit :
>
> On Fri, 2002-06-07 at 09:39, David Turner wrote:
> > I'd also like to include "ftbench.c" as part of the "ft2demos" package,
> > however, it will need a few small cleanups since its source code probably
> > only compiles with GCC due to statements like:
> >
> > {
> > int array1[ face->num_glyphs ]; // dynamic stack allocation !!
> > int array2[ face->num_glyphs ];
> >
> > ....
> > }
> >
> > Could you polish it a bit before we include it ??
>
> Done, still available here : http://zerodeux.net/ft/
>
> - the stack allocation code has been replaced by a good old calloc().
> Anyway I was fetching the whole charmap at every bench(), yek. This kind
> of code works well with gcc or MSVC, but it's effectively a C extension
> (non-POSIX).
>
> - the 'bench time' constant is available from the command line
>
> - the tests can be selected from the command line
>
> - added the legal ft2demo blurb. I'm hereby giving this code and its
> copyright to the FreeType project, if it has to be said clearly.
>
Thanks a lot, it is now in "ft2demos/src/ftbench" and should compile
on all platforms supported by FreeType. There are probably a couple
compiler warnings still, but I'll get rid of them very soon..
Since I was optimizing the cache manager when you first posted about
your benchmark program, I had the pleasure to use it to refine the
code in efficient ways. This results in a 20% to 50% speed-up in
all cache hits, depending on your compiler, CPU and operating
system.
Note that all caches are now significantly fasters. For example, the
CMap cache is now a _lot_ faster than calling FT_Get_Char_Index
directly while preserving "abstractness"...
Here's the output for Arial.ttf on a 233 MHz Mobile Pentium:
Load : 428.819 us/op
Load + Get_Glyph : 471.258 us/op
Load + Get_Glyph + Get_CBox : 475.309 us/op
Get_Char_Index : 1.882 us/op
CMap cache (1st run) : -0.000 us/op
CMap cache : 0.899 us/op
Outline cache (1st run) : 455.247 us/op
Outline cache : 2.189 us/op
Bitmap cache (1st run) : 564.043 us/op
Bitmap cache : 1.708 us/op
SBit cache (1st run) : 556.327 us/op
SBit cache : 1.369 us/op
Just another good reason to use the FT2 cache in your applications
and libraries, instead of rewriting your own :-)
Thanks,
- David Turner
- The FreeType Project (www.freetype.org)
PS:
For the technically savvy, these improvements were obtained by
applying three important techniques:
- inlining a few common routine calls and optimizing
the result (i.e. rewriting ftc_node_mru_up which called
ftc_node_mru_unlink and ftc_node_mru_link)
- performing template instantiation (possible in C with
a few pre-processor tricks. see the file src/cache/ftccache.i)
to speed-up cache hits.
- finally, changing the internal hash table computation. The
previous algorithm was based on the algorithm used by the GLib.
I replaced it with a straight-forward implementation of
linear hashing, which is significantly faster while preventing
any "chokes" when resizing the buckets array.
for more details on this algorithm, see:
http://citeseer.nj.nec.com/34453.html