[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)

From: Eric Blake
Subject: Re: performance of m4-1.9a (was: popdef(undefined), __m4_version__)
Date: Mon, 11 Aug 2008 20:38:36 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Ralf Wildenhues <Ralf.Wildenhues <at> gmx.de> writes:

> Hello again, and apologies for breaking the threading,

No problem.  In truth, this is enough of an independent topic to be worth the 
broken threading.

> I've done a wee bit of measuring now.  Time for running autoconf in OpenMPI
> is 15s with branch-1_4 and branch-1.6, 27s with master, and 23s when master
> is configured --disable-shared.

Thanks for the stats.

> Then, a gprof comparison between 1.6 and master shows that a significant other
> part of the slowdown is due to the fact that master has to do an indirect
> function call to for every character in next_char.  Can't the module interface
> use larger boundaries than character for its interface, like reading a whole
> token or so?  I mean, we're talking about roughly 140M function calls here.

Sweet!  Your measurements confirmed what I already suspected.  And this means a 
performance patch is already in the pipeline - the moment I port stage 29 from 
the argv_ref branch (currently at [1], although that branch is still being 
actively rewound at times as I rebase in various bug fixes), then the input 
engine will be doing just that - reading blocks of data rather than bytes.

[1] http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=32c3fec7

> Then, I saw that debug stuff like m4_set_current_{file,line} was called veeery
> often (more than once per character).  Rebuilding optimized with -DNDEBUG got
> master to 18s (with --disable-shared).

OK, something I will take a look at improving.  The speed from -DNDEBUG comes 
from avoiding the overhead of a function, thanks to inline accessor macros, but 
avoiding changing the current line and file more than necessary seems like a 
good idea.  At any rate, './configure --disable-assert' is very much a 
performance improvement, on all of the m4 branches.

> The gprof output files seem to indicate that next_char is called much more
> often m4__next_token in master than next_char_1 is from next_token in
> branch-1.6. However, gcov output does not confirm this, so I guess this is
> an artifact from finite sampling density (and the amount that next_char_1
> is faster) or inlining artifacts.

This doesn't surprise me: in branch-1.6, the macro next_char inlines the common 
case of rereading from a string, avoiding a number of next_char_1 calls, but in 
master, there is no inlining because all access is done through indirect 

Eric Blake

reply via email to

[Prev in Thread] Current Thread [Next in Thread]