[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] linux-user crashes on clone(2) when run on p
Emilio G. Cota
Re: [Qemu-ppc] [Qemu-devel] linux-user crashes on clone(2) when run on ppc host
Thu, 18 Jun 2015 14:36:06 -0400
On Thu, Jun 18, 2015 at 15:55:54 +0100, Peter Maydell wrote:
> I'd forgotten we had that mutex. However it's not actually
> a sufficient fix for the problem. What needs to happen is
> (a) somebody actually sits down and figures out what data
> structures we have and what locking/per-cpuness/etc they need,
> ie a design
> (b) somebody implements that design
I'm exactly doing (a) and (b). In doing so I've found this crash,
and I think that it is not due to races in QEMU--it seems
to be a ppc64 issue.
> This is happening as port of the TCG multithreading work:
I'm closely following these discussions. My goal is to
have a sane multithreaded linux-user first, and then move on
> This is the bug we've had kicking around for a while about
> multithreading races:
I've tried to reproduce it, but I can't (I could easily trigger
it with the qemu that is packaged by Ubuntu 14.04). I've
added a message to the thread stating this.
> As just one example race, consider the possibility that
> thread A calls tb_gen_code, which calls tb_alloc, which
> calls tb_flush, which clears the whole code cache, and then
> tb_gen_code starts generating code over the top of a TB
> that thread B was in the middle of executing from...
Agreed, this needs to be fixed. Certainly not the problem I'm
reporting here, however.
> >> On 17 June 2015 at 22:36, Emilio G. Cota <address@hidden> wrote:
> >> > I don't think this is a race because it also breaks when
> >> > run on a single core (with taskset -c 0).
> > As I said, this problem doesn't seem to be a race.
> The multiple threads will still all be racing with each
> other on the single core.
How? Even the bug report on launchpad cannot be triggered (on a
relatively ancient qemu) if the program is pinned to just one host core.
> In general I don't see much benefit in detailed investigation
> into the precise reason why a guest program crashes when
> the whole area is known to be fundamentally not designed
I'm working on such a design, not just on paper but with working
code. For instance, my trying to run qemu on ppc64 is to test an
initial solution to the TSO on RMO problem, i.e. the memory
consistency mismatch. ppc64 is unfortunately the only SMP RMO
machine I could get access to--I'd be happy to test this on an
ARM SMP and forget about ppc64 for now if I could.
To me this looks like a ppc64 issue, and I'd be very grateful
if ppc64 folks could take a look.