[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking re
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime |
Date: |
Wed, 05 Jul 2017 20:30:01 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.50.3 |
Peter Maydell <address@hidden> writes:
> On 5 July 2017 at 17:01, Alex Bennée <address@hidden> wrote:
>> An interesting bug was reported on #qemu today. It was bisected to
>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>> occurs in both single and multi-threaded modes.
>>
>> I think the problem is qemu_system_reset_request() is certainly racy
>> when resetting a running CPU. AFAICT:
>>
>> - Guest resets board, writing to some hw address (e.g.
>> arm_sysctl_write)
>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>> - We exit iowrite and drop the BQL
>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>> - we start writing new values to CPU env while still in TCG code
>> - CHAOS!
>>
>> The general solution for this is to ensure these sort of tasks are done
>> with safe work in the CPUs context when we know nothing else is running.
>> It seems this is probably best done by modifying
>> qemu_system_reset_request to queue work up on current_cpu and execute it
>> as safe work - I don't think the vl.c thread should ever be messing
>> about with calling cpu_reset directly.
>
> My first thought is that qemu_system_reset() should absolutely
> stop every CPU (or other runnable thing like a DMA agent) in the
> system.
Are all these reset calls system wide though? After all with PCSI you
can bring individual cores up and down. I appreciate the vexpress stuff
pre-dates those well defined semantics though.
> The semantics are basically "like a power cycle", so
> that should include a complete stop of the world. (Is this
> what vm_stop() does? Dunno...)
vm_stop certainly tries to deal with things gracefully as well as send
qapi events, drain IO queues and the rest of it. My only concern is it
handles two cases - external vm_stops and those from the current CPU.
I think it may be cleaner for CPU originated halts to use the
async_safe_run_on_cpu() mechanism. It has clear semantics with respect
to the behaviour of other CPUs. If you queue work with
async_safe_run_on_cpu and do a cpu_loop_exit you can guarantee all vCPUs
have stopped and the work has been serviced before the originating vCPU
executes its next instruction.
>
> thanks
> -- PMM
--
Alex Bennée