qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions


From: Al Viro
Subject: Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
Date: Tue, 8 Jul 2014 21:12:34 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote:
 On 8 July 2014 18:20, Al Viro <address@hidden> wrote:
> > On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
> >
> >> > Incidentally, combination of --enable-gprof and (default) --enable-pie
> >> > won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
> >>
> >> This sounds like a toolchain bug to me :-)
> >
> > Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
> > this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
> > That one has gcc 4.4.3.
> 
> That just makes it a long-standing toolchain bug. I don't see any
> reason why PIE + gprof shouldn't work, it just looks like gprof
> doesn't ship and link a PIE runtime.

*nod*

It's not a huge itch to scratch for me, and I'm not even sure whether the
bug should be filed for gcc or for libc (probably the latter).  In any case,
having that information findable in list archives would probably be a good
thing.

Again, gprof isn't particulary useful - kernel-side profilers are at least as
good.  So I suspect that most of the people running into that simply shrug and
use those instead.  Narrowing it down to -pie didn't take long and I can
confirm that this is the root cause of that breakage.  Should make debugging
said toolchain bug a bit easier, if anybody cares to do that...

> > Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
> > going through their kernel package build.  I have perf report in front of
> > me right now; the top ones are
> >  41.77%  qemu-system-alp  perf-24701.map           [.] 0x7fbbee558930
> >  11.78%  qemu-system-alp  qemu-system-alpha        [.] cpu_alpha_exec
> 
> > and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
> > It might be worth checking the actual distribution of the hash of virt
> > address used by that sucker - I wonder if dividing its argument by 4
> > wouldn't improve the things, but I don't have stats on actual frequency
> > of conflicts, etc.  In any case, the first lump (42%) seems to be tastier 
> > ;-)
> 
> Depends on your point of view -- arguably we ought to be spending *more*
> time executing translated guest code... (As you say, the problem is that
> we don't have any breakdown of what things might turn out to be hotspots
> in the translated code.)

Might be a fun project to teach perf that hits in such-and-such page should
lead to lookup in a table annotating it.  As in "offsets 42..69 should be
recorded as (<this address> + offset - 42).  Then tcg could generate
such tables and we'd get information like "that much time is spent in
the second host insn of instances of that code pattern generated by
tcg_gen_shr_i64", etc.

No idea if anything of that sort exists - qemu is not the only possible user
for that; looks like it might be useful for any JIT profiling, so somebody
could've done that already...



reply via email to

[Prev in Thread] Current Thread [Next in Thread]