lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Compiling takes longer with gcc-4.9.2


From: Greg Chicares
Subject: Re: [lmi] Compiling takes longer with gcc-4.9.2
Date: Sun, 17 Jan 2016 17:51:45 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

On 2015-12-22 00:28, Greg Chicares wrote:
> My local tree contains makefile changes to use MinGW-w64 gcc-4.9.2,
> with '-std=c++11', some new warning flags, and various other
> adjustments.

I'm almost ready to commit those changes. First, however, I tried them
out on different hardware, and found a PCH surprise. Timings quoted
from the 2015-12-22 original used old hardware; new timings reported
today use new hardware unless explicitly indicated otherwise.

  old: 2 x E5520,      3 Gbps sata ii,  wd black WD2003FZEX, debian-7.7
  new: 2 x #5-2530 v3, 6 Gpbs sata iii, samsung 850 pro 1TB, debian-7.9

I expect the new hardware to perform uniformly better than the old.

> First I compile wx and wxPdfDoc:
> 
> This is 32-bit msw-xp in a VM

Selfsame VM on both machines. Both have 16 vCPUs. The only difference
is that the new machine's qemu settings specify a "Sandy Bridge" guest:
that's the latest CPU recognized by this software version, and I found
that it gives better performance than the "Nehalem" setting on the old
machine.

> make --jobs=8 -f install_wx.make > ../log 2>&1
> 2662.23s user 1594.59s system 475% cpu 14:55.38 total

make --jobs=8 -f install_wx.make > ../log 2>&1
2727.15s user 6098.59s system 214% cpu 1:08:39.33 total

Sixty-eight minutes on new hardware vs. fifteen on old. Unbelievable.
Something else must have changed...and here it is:

http://svn.savannah.nongnu.org/viewvc/lmi/trunk/install_wx.make?root=lmi&r1=6443&r2=6463&diff_format=u
+  --disable-precomp-headers \

Removing that one line, I measure the same build, twice:

make --jobs=8 -f install_wx.make > ../log 2>&1
1848.43s user 1444.34s system 391% cpu 14:01.53 total
1843.52s user 1446.63s system 389% cpu 14:05.59 total

All builds succeed. Building lmi succeeds both with or without PCH. The
only difference is PCH, which makes the build about five times as fast.
At least for now, I'm going to revert that part of revision 6463.

Monitoring the performance in msw-xp "task manager", I noticed something
curious in the first run, without PCH. When 'config' completed, eight
compiler instances started. Each used about 10MB at first, then 20, then
30, then 40: their "mem usage" advanced slowly in lockstep, the figures
being almost identical for all processes. There was at least a gigabyte
of free memory at all times. It sure looks like they're contending for
the same resource, which probably isn't RAM, though I don't know what it
might be. The bottleneck might even be the preprocessor.

I temporarily restored
+  --disable-precomp-headers \
and tried again, just to be sure the first run wasn't an anomaly;
but it behaved like the first run, and I Ctrl-C'd it after half
an hour.

I retested this on the old machine with the old compiler, and found
very little difference with or without PCH:
 - Old machine, gcc-3.4.5, with '--disable-precomp-headers' as in HEAD:
    make $coefficiency -f install_wx.make > ../log 2>&1
    501.14s user 896.77s system 165% cpu 14:03.39 total
 - Old machine, same, but using PCH ('--disable-precomp-headers' removed):
    make $coefficiency -f install_wx.make > ../log 2>&1
    501.43s user 933.09s system 176% cpu 13:33.30 total
The "task manager" graphs were almost indistinguishable. I'd guess
that gcc-4.9.2 taxes the 32-bit OS in a way that gcc-3.4.5 does not,
and in a way that PCH greatly relieves.

> make --jobs=8 -f install_wxpdfdoc.make > ../log 2>&1
> 446.34s user 182.72s system 393% cpu 2:39.91 total

 - wx compiled without PCH:
make --jobs=8 -f install_wxpdfdoc.make > ../log 2>&1
295.29s user 307.44s system 272% cpu 3:41.44 total

 - wx compiled with PCH--seems like that might help:
make --jobs=8 -f install_wxpdfdoc.make > ../log 2>&1
290.17s user 273.35s system 288% cpu 3:15.20 total

It does seem odd that wxPdfDoc built faster on the old machine.
I'm not terribly worried about that: it doesn't take long, and
we rarely rebuild it; still, it's weird.

> Now I do a complete rebuild of lmi, which I very recently measured
> with mingw.org's native gcc-3.4.5 as follows:
> 
>   --jobs=16
> 15.89s user 26.48s system 21% cpu 3:18.89 total

This was chronologically the first thing I did after migrating the VM,
so these results demonstrate vCPU tuning. I did
  rm -rf /lmi/src/build/lmi/CYGWIN_NT-5.1/gcc/ship
before each run, to force a complete rebuild.

VM as migrated (copy the image and 'virsh define' with the same xml)
has almost exactly the same performance:

3:45.55 cold (first time the VM was loaded)
3:18.67 repeat
3:22.25 repeat

Change vCPU from "Nehalem" to "Sandy Bridge", which virt-manager's
"Copy host CPU configuration" chose for the old and new machines
respectively; set CPU topology to 1-8-2 (sockets-cores-threads):

2:58.94
2:56.92

Same, but topology 2-8-2, with 16 vCPUs as always:

2:54.64
2:54.56

I'm not sure whether the difference is significant, but I'll use 2-8-2.
There isn't much published guidance, but see:
  
https://www.berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/
| If our guest workload required 8 virtual CPUs, since each NUMA node
| only has 4 physical CPUs, better utilization may be obtained by
| running a pair of 4 cpu guests & splitting the work between them,
| rather than using a single 8 cpu guest.

Just for laughs, this might be the only time anyone has run 32-bit msw-xp
with 24 vCPUs:

5:32.59 --jobs=24
4:48.89 --jobs-16

There was RAM to spare throughout those runs. One might conjecture that
msw-xp chokes on that many vCPUs. I guess I won't try 32.

> ...and here are results for MinGW-w64 gcc-4.9.2 as installed by
> lmi's current 'install_cygwin.bat' (package 'mingw64-i686-gcc-g++'):
> 
> make --jobs=4 install check_physical_closure > ../log 2>&1
> 1324.14s user 581.83s system 341% cpu 9:17.47 total

 - wx compiled without PCH:
make --jobs=4 install check_physical_closure > ../log 2>&1
860.19s user 799.93s system 347% cpu 7:58.31 total

 - wx compiled with PCH--no apparent difference:
make --jobs=4 install check_physical_closure > ../log 2>&1
860.88s user 801.01s system 347% cpu 7:58.89 total

New hardware makes a small difference for gcc-4.9.2, though it seemed
to make no difference for gcc-3.4.5 .




reply via email to

[Prev in Thread] Current Thread [Next in Thread]