lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Compiling takes longer with gcc-4.9.2


From: Vadim Zeitlin
Subject: Re: [lmi] Compiling takes longer with gcc-4.9.2
Date: Sun, 3 Jan 2016 23:13:15 +0100

On Thu, 31 Dec 2015 04:57:31 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2015-12-22 00:28, Greg Chicares wrote:
GC> > My local tree contains makefile changes to use MinGW-w64 gcc-4.9.2,
GC> > with '-std=c++11', some new warning flags, and various other
GC> > adjustments.
GC> 
GC> Hardware change:
GC>   old: dual    E5520  ; sata 6Gbps, SSD = samsung 850 pro
GC>   new: dual E5-2630 v3; sata 3Gbps, HDD = wd caviar black WD2003FZEX

 Do you plan to move the SSD from the old to the new machine later? It
would be interesting to see if it affect the results.

GC> > First I compile wx and wxPdfDoc:
GC> [...]
GC> > make --jobs=8 -f install_wx.make > ../log 2>&1
GC> > 2662.23s user 1594.59s system 475% cpu 14:55.38 total
GC> 
GC> I ran out of memory with eight cores, so I used six here:
GC> 
GC> make --jobs=6 -f install_wx.make > ../log 2>&1
GC> 1966.27s user 1080.01s system 389% cpu 13:02.98 total

 I've redone the benchmarks on a Linux machine (this is a notebook with i7
4712HQ with 16GiB RAM and 1TB SSD) to compare the relative speed of
compiling inside the VM and cross-compiling. The first set of benchmarks
shows the time needed to build wxWidgets after configuring it using the
options from install_wx.make. I also listed the sizes of the build
directory just to give the idea of the PCH disk use overhead and the CPU
use as reported by "time" shell built-in just as a general sanity check.
Note that this column is not filled in for the builds using compilers not
native to the build OS because the numbers are manifestly wrong in this
case (30-40% even though the Windows task manager shows that all CPUs are
fully used during most of the time of the build).

 Also notice that my times are those of "make -s" only as otherwise
install_wx.make would have been penalized compared to just configure+make
as it also does other things (i.e. uncompresses the archive and verifies
its integrity). However this underestimates the advantage of
cross-compiling as configure step takes 8.8 seconds under Linux and 60+
under Cygwin (probably because configure launches a lot of child processes
which is just about the worst thing for Cygwin and Windows from the
performance point of view).

 Anyhow, after all these introductory notes, here is the first bunch of
results:

wxMSW build             Time (s)        Size (MB)       CPU use (%)
=========================================================================
MinGW 3.4.5             341               82
Cygwin 4.9.2            424              463            637
MinGW-w64 4.9.1         429              462
Debian 4.9.1            322              567            629

The expected thing here is that cross-compiling is significantly faster
than building in the VM, as I was hoping. In fact, for me it is faster than
using 3.4.5 inside the VM, so it's already a gain. However clearly the
situation is not the same for you and me as I don't see nearly a 3 times
slowdown for the in-VM builds in the first place. A half-surprise is that
the native compiler is not faster than the Cygwin one on this machine,
unlike in my previous tests and I'm not sure why is it so, but even after
redoing the build several times, I still see the same thing, both builds
take about the same time (5s difference is not really outside the
measurement error, I do see 2-3s variation between the builds). To
summarize, without doing anything else, I have ~25% slowdown when building
inside the VM with the new compiler but a slight (~5%) speedup when
cross-compiling.


 But this is just the beginning, not the end, of our benchmarking story.
As you remember, I was surprised by the lack of usefulness of precompiled
headers when cross-compiling lmi. So let's see if they help when building
wxWidgets itself, i.e. configure it using extra --disable-precomp-headers
options. Here are the numbers:

wxMSW no PCH            Time (s)        Size (MB)       CPU use (%)
=========================================================================
Cygwin 4.9.2            396              134            711
MinGW-w64 4.9.1         355              134
Debian 4.9.1            269              136            737

This was a huge surprise to me as I didn't expect to have such big gains
from _disabling_ the PCH. But we gain ~7%, 17% and 16% respectively from
just doing this. So, for me, simply disabling the precompiled headers
brings 4.9.1 almost in line with 3.4.5 (there is just a 4% difference which
is really not much especially considering that we're speaking of -O2 builds
and that 4.9 should optimize much, much better than 3.4.5 -- and it would
be interesting to run lmi benchmarks to check how much exactly later) when
building inside the VM and cross-compiling is more than 20% faster than
using the old compiler. The only regression is in the size of the build
files, but notice that the size of the DLLs produced is roughly the same,
so it's not really a problem.


 Still, the most shocking discovery was that PCH had such a huge negative
effect in the first place. At this stage I was seriously doubting my
sanity, so I decided to rebuild wxGTK to check if it's affected by the PCH
in the same way. And the answer was emphatically not:

wxGTK                   Time (s)        Size (MB)       CPU use (%)
=========================================================================
default                 121             902             759
no PCH                  184              73             771

There is certainly a huge space penalty for using PCH, but it is also 33%
faster (which is still less than I thought but at least is positive). So,
after thinking about this for a bit, I realized that the difference could
be due to using --enable-monolithic for lmi but not for the default builds,
so I decided to try without it, using 4.9.1 cross-compiler (as it's the
fastest):

wxMSW multilib          Time (s)        Size (MB)       CPU use (%)
=========================================================================
default                 210             1434            729
no PCH                  240              100            772

As you can see, there is still a huge space difference, but at least now
the PCH build is ~12% faster. Perhaps more importantly, the multilib build
is also ~11% faster than monolithic one without PCH. So if you want to have
faster builds, it could be worth stopping to use the monolithic library.
Notice that it would also be worth using --without-opengl configure option
in either case as lmi doesn't wxGLCanvas, but this is a small gain.

 To summarize the story so far: the fastest way to build is to
cross-compile instead of compiling inside the VM and to disable the PCH
when using monolithic build. But not using monolithic build in the first
place is even faster and results in 22% gain (at the price of extra 1.3GB
of disk space) compared to the monolithic build.


 However notice that all these numbers are for the default compiler C++
dialect which is still C++03 for g++ 4.9 (it has changed to C++11 since
5.0). Adding CXXFLAGS=-std=c++11 changes the numbers for all the builds and
not in the good sense. To give an idea of it, here are the results for
wxGTK:

wxGTK C++11             Time (s)        Size (MB)       CPU use (%)
=========================================================================
default                 176             1210            769
no PCH                  267               73            774

For wxGTK there is a constant (i.e. PCH-independent) slowdown of 45% which
is really pretty shocking: while I did expect compiling C++11 standard
library headers to take longer just because they are a strict superset of
the older standard, the degree of the difference is an unwelcome surprise.
Also, using PCH is still worthwhile in this case, although the space
penalty is even worse too (but then the extra 300MB is not a big deal
compared to almost 1GB wasted on the PCH anyhow; just for comparison, the
full build, with everything enabled, uses 4.3GB with PCH and ~700MB
without).

For wxMSW builds the situation is less catastrophic but still pretty bad:

wxMSW C++11             Time (s)        Size (MB)       CPU use (%)
=========================================================================
Cygwin 4.9.2 no PCH     501              133            721
MinGW-w64 4.9.1 no PCH  455              133
Debian 4.9.1            377              641            649
Debian 4.9.1 no PCH     360              135            749

The default build slows down by "just" 17%, probably because a big part of
its time was already taken by linking. For the faster (again, in monolithic
case only) PCH-less build the slowdown is whopping 33% and it's only
slightly less (~27%) for the builds inside the VM.

 So C++11 support doesn't come for free, in build time terms. I'd still
like to enable it because it brings important benefits in terms of
development time, i.e. productivity. And at least in relative terms, using
4.9.1 as C++11 cross-compiler is only 15s slower than using 3.4.5 inside
the VM, so I'm pretty confident it will still be faster for you on your new
machine than on the old one and hence I hope that we can still start using
it.


GC> > Now I do a complete rebuild of lmi, which I very recently measured
GC> > with mingw.org's native gcc-3.4.5 as follows:
GC> > 
GC> >   --jobs=16
GC> > 15.89s user 26.48s system 21% cpu 3:18.89 total
...
GC> > ...and here are results for MinGW-w64 gcc-4.9.2 as installed by
GC> > lmi's current 'install_cygwin.bat' (package 'mingw64-i686-gcc-g++'):
GC> [...snipping all but the best run, with four vCPUs...]
GC> > make --jobs=4 install check_physical_closure > ../log 2>&1
GC> > 1324.14s user 581.83s system 341% cpu 9:17.47 total
...
GC> make --jobs=6 install check_physical_closure > ../log 2>&1
GC> 871.03s user 683.03s system 482% cpu 5:21.92 total
...
GC> Building lmi itself was the most painful gcc-4.9.2 slowdown with the
GC> old machine. The new machine is considerably faster than the old,
GC> but even new hardware with gcc-4.9.2 can't compete with old hardware
GC> and gcc-3.4.5 .

 I think cross-compiling should be even more beneficial for you because you
should be able to use -j16 without problems then.

GC> > I can't figure out why the best result comes from '--jobs=4'. If the
GC> > number of CPUs isn't the bottleneck, what is? disk? RAM? CPU cache?
GC> 
GC> The hardware comparison suggests to me that it's not disk-bound or
GC> CPU-bound. I guess it's just the 32-bit guest OS.

 FWIW for me it is CPU-bound. All 8 (logical) CPUs are pegged during the
compilation and 1 of them remains at 100% use throughout the configure and
linking steps too.


GC> Conclusion: gcc-4.9.2 should be used with a 64-bit OS; there's no
GC> point in trying to breathe life into this old 32-bit VM.

 This is probably correct, and setting up a 64 bit VM probably makes sense
anyhow. But you could also use the cross-compiler. I don't have the exact
numbers for lmi because the autotools-based build system doesn't do exactly
the same thing as the official one, so they can't be compared directly. But
the 25% seen above seems quite achievable because now I see a ~40%
difference (but, again, this is just because less is being done).

 The slightly worse news is that cross-compiling all the dependencies is a
bit tricky and I ran into several problems doing it. It's supposed to be as
simple as just using --host=i686-w64-mingw32 but in practice there are some
bugs preventing this from working and I'm going to post some notes about
what I did to avoid them a bit later (unless you tell me you don't need
them because you've already figured all this out while I was running the
endless build benchmarks).

 Please let me know how would you like to proceed and what else do you
think it could be interesting to do.

 Best regards,
VZ

reply via email to

[Prev in Thread] Current Thread [Next in Thread]