qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr ha


From: Richard Henderson
Subject: Re: [Qemu-arm] [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr hang (DEBUG!)
Date: Sat, 10 Jun 2017 09:59:19 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 06/10/2017 01:51 AM, Alex Bennée wrote:

Richard Henderson <address@hidden> writes:

On 06/09/2017 10:01 AM, Alex Bennée wrote:
THIS IS A DEBUG PATCH DO NOT MERGE

I include all the comments to show my working. I was trying to
isolate which instructions cause the problem. It turns out it is the
RET instruction. I don't understand why because AFAICT it is a
pretty much a BR instruction.

Yeah, same thing for Alpha.

It has been my guess that not chaining through RET means that we get
back to the main loop regularly and often, letting interrupts be
recognized in a timely manner.

I can't figure out why that would be, however, since interrupts
*ought* to be setting icount_decr, and the TB to which we chain *is*
checking that to return to the main loop.

Indeed - if that was broken a lot more stuff wouldn't work.

Since changing the timing affects the outcome (e.g. -d exec), it
follows that this *must* be some sort of race condition.  But since
this still happens with single-threaded mode, I can't imagine what
sort of race condition it might be.

Apart from timer expiry I can't think what other interactions the other
threads have on the main TCG thread. I guess there is IO but my test
hangs way before the kernel starts poking the disk. Is there an
interaction between IRQs and QEMU's serial driver?

The Alpha hang appears to be timer expiry. In that it happens as soon as the kernel spawns some kthreads to finish up the boot process. The kernel then sits in the idle loop for an unreasonably long time.

But, bizarrely, it will complete the boot eventually. But it takes ~5 minutes to do so, when we ought to be able to boot to prompt in seconds.

More data points.  I removed the tb_htable_lookup, and that by itself
is enough to fix Alpha booting.  But it doesn't help the aarch64
kernel+image that I have.  Which does still boot with -d nochain
(which, along with disabling goto_tb chaining, also disables all
goto_ptr).

I wonder what is different about your aarch64 image and mine then?
Because mine works just with suppressing the chaining for RET.

Oh I just tried -d nochain because it doesn't require source modification.

Not really sure where to go from here.

I would agree with Emilio that we revert but I can't quite shake the
feeling we are missing an underlying problem. Would just skipping the
htable lookup (but keeping the tb_jmp_cache) be an OK fix for now?

I agree.  It seems like there's some real problem that this is uncovering.

Dropping the htable lookup is certainly ok by me. If that's enough to un-stick your regression testing for aarch64 guest.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]