qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1.3] build: compile translate.o with -fno-gcs


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v2 1.3] build: compile translate.o with -fno-gcse option
Date: Wed, 28 Nov 2012 08:29:37 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0

Il 27/11/2012 19:17, Stefan Weil ha scritto:
> A real problem could arise from compilers which don't support -fno-gcse.

It was introduced in GCC 3.0.

> As this option is not checked for compatibility in configure, such 
> compilers would no longer work with unmodified QEMU sources. clang
> obviously supports -fno-gcse, so maybe we don't have a real problem
> currently.

Yes.

> For the buildbot machines, "configure --enable-debug" wouldsolve the
> OOM problem, dramatically reduce compilation time, add some
> compile time checks for TCG, reduce CO2 emission, ... For most buildbots,
> --enable-debug would be a good choice. There are some kinds of errors
> which compilers only detect during their optimization pass, so some
> buildbots should still run without --enable-debug.

No, --enable-debug is not a solution.  Fixing GCC bugs, or working
around them if possible/useful, is.

> Do we need -fno-gcse for all */translate.c or only for some of them?

Intel is an order of magnitude worse than the others; however, all of
the translate.c are potentially susceptible to this problem.  It happens
when you have -fPIE or -fPIC, and largish functions that access a lot of
globals.  translate.c tends to use tcg_ctx, and to inline almost
everything into disas_insn... hence the problem.

Intel is the worst, but SPARC also requires 300MB for GCSE.  PPC is
special: it "only" needs 55MB for GCSE, but 150MB for inlining and
similarly for other passes---more than other targets.  I put "only" in
quotes because even Intel with a patched GCC requires only 1.5MB for
GCSE, and without sacrificing any optimization.

> I think it is caused by huge switch statements
> in those files.

GCC can handle much worse control flow.  Over the years, the developers
got really fiendish testcases, mostly template-heavy C++ code or
computer-generated.  These testcases have a single huge program in a
single function, and are "interesting" to say the least.

In this case the memory needed is indeed quadratic, but (roughly) in the
number of globals that are accessed in the function.  GCC uses a garbage
collector, but it runs it only between optimization passes in general;
usually it doesn't find that much garbage.  In this case, GCSE produces
hundreds of MB of garbage.  Fixing the bug is just a matter of moving
some invariant stuff out of an inner loop (interestingly it doesn't save
much computation time, only memory).

Paolo

> Splitting those switch statements might also help.
> If the memory needed grows with n * n (n = number of case statements
> in one switch statement), then splitting a switch statement in two
> would reduce the memory needed from 2 GiB to 0.5 GiB.
> 
> Regards
> Stefan
> 
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]