qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking re


From: Peter Maydell
Subject: Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Date: Wed, 5 Jul 2017 20:42:25 +0100

On 5 July 2017 at 20:30, Alex Bennée <address@hidden> wrote:
>
> Peter Maydell <address@hidden> writes:
>
>> On 5 July 2017 at 17:01, Alex Bennée <address@hidden> wrote:
>>> An interesting bug was reported on #qemu today. It was bisected to
>>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>>> occurs in both single and multi-threaded modes.
>>>
>>> I think the problem is qemu_system_reset_request() is certainly racy
>>> when resetting a running CPU. AFAICT:
>>>
>>>   - Guest resets board, writing to some hw address (e.g.
>>>     arm_sysctl_write)
>>>   - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>>   - We exit iowrite and drop the BQL
>>>   - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>>   - we start writing new values to CPU env while still in TCG code
>>>   - CHAOS!
>>>
>>> The general solution for this is to ensure these sort of tasks are done
>>> with safe work in the CPUs context when we know nothing else is running.
>>> It seems this is probably best done by modifying
>>> qemu_system_reset_request to queue work up on current_cpu and execute it
>>> as safe work - I don't think the vl.c thread should ever be messing
>>> about with calling cpu_reset directly.
>>
>> My first thought is that qemu_system_reset() should absolutely
>> stop every CPU (or other runnable thing like a DMA agent) in the
>> system.
>
> Are all these reset calls system wide though?

It's called 'system_reset' because it resets the entire system...

> After all with PCSI you
> can bring individual cores up and down. I appreciate the vexpress stuff
> pre-dates those well defined semantics though.

It's individual core reset that's a more ad-hoc afterthought,
really.

> vm_stop certainly tries to deal with things gracefully as well as send
> qapi events, drain IO queues and the rest of it. My only concern is it
> handles two cases - external vm_stops and those from the current CPU.
>
> I think it may be cleaner for CPU originated halts to use the
> async_safe_run_on_cpu() mechanism.

System reset already has an async component to it -- you call
qemu_system_reset_request(), which just says "schedule a system
reset as soon as convenient". qemu_system_reset() is the thing
that runs later and actually does the job (from the io thread,
not the CPU thread).

Looking more closely at the vl.c code, it looks like it
calls pause_all_vcpus() before calling qemu_system_reset():
shouldn't that be pausing all the TCG CPUs?

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]