Re: performance of m4-1.9a (was: popdef(undefined), __m4_version_

m4-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)

From:	Eric Blake
Subject:	Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Date:	Mon, 11 Aug 2008 20:38:36 +0000 (UTC)
User-agent:	Loom/3.14 (http://gmane.org/)

Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:

> 
> Hello again, and apologies for breaking the threading,

No problem.  In truth, this is enough of an independent topic to be worth the 
broken threading.

> 
> I've done a wee bit of measuring now.  Time for running autoconf in OpenMPI
> is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
> is configured --disable-shared.

Thanks for the stats.

> 
> Then, a gprof comparison between 1.6 and master shows that a significant other
> part of the slowdown is due to the fact that master has to do an indirect
> function call to for every character in next_char.  Can't the module interface
> use larger boundaries than character for its interface, like reading a whole
> token or so?  I mean, we're talking about roughly 140M function calls here.

Sweet!  Your measurements confirmed what I already suspected.  And this means a 
performance patch is already in the pipeline - the moment I port stage 29 from 
the argv_ref branch (currently at [1], although that branch is still being 
actively rewound at times as I rebase in various bug fixes), then the input 
engine will be doing just that - reading blocks of data rather than bytes.

[1] http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=32c3fec7

> 
> Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
> often (more than once per character).  Rebuilding optimized with -DNDEBUG got
> master to 18s (with --disable-shared).

OK, something I will take a look at improving.  The speed from -DNDEBUG comes 
from avoiding the overhead of a function, thanks to inline accessor macros, but 
avoiding changing the current line and file more than necessary seems like a 
good idea.  At any rate, './configure --disable-assert' is very much a 
performance improvement, on all of the m4 branches.

> 
> The gprof output files seem to indicate that next_char is called much more
> often m4__next_token in master than next_char_1 is from next_token in
> branch-1.6. However, gcov output does not confirm this, so I guess this is
> an artifact from finite sampling density (and the amount that next_char_1
> is faster) or inlining artifacts.

This doesn't surprise me: in branch-1.6, the macro next_char inlines the common 
case of rereading from a string, avoiding a number of next_char_1 calls, but in 
master, there is no inlining because all access is done through indirect 
functions.

-- 
Eric Blake

[Prev in Thread]

Current Thread

[Next in Thread]

performance of m4-1.9a (was: popdef(undefined), __m4_version__), Ralf Wildenhues, 2008/08/11
- Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__), Eric Blake <=

Prev by Date: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Next by Date: minor cleanups
Previous by thread: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Next by thread: minor cleanups
Index(es):
- Date
- Thread