[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] cpus: avoid get stuck in pause_all_vcpus
From: |
Longpeng (Mike, Cloud Infrastructure Service Product Dept.) |
Subject: |
Re: [RFC] cpus: avoid get stuck in pause_all_vcpus |
Date: |
Fri, 13 Mar 2020 09:43:44 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 2020/3/12 23:28, Paolo Bonzini wrote:
> On 10/03/20 10:14, Longpeng(Mike) wrote:
>> From: Longpeng <address@hidden>
>>
>> We find an issue when repeat reboot in guest during migration, it cause the
>> migration thread never be waken up again.
>>
>> <main loop> |<migration_thread>
>> |
>> LOCK BQL |
>> ... |
>> main_loop_should_exit |
>> pause_all_vcpus |
>> 1. set all cpus ->stop=true |
>> and then kick |
>> 2. return if all cpus is paused |
>> (by '->stopped == true'), else|
>> 3. qemu_cond_wait [BQL UNLOCK] |
>> |LOCK BQL
>> |...
>> |do_vm_stop
>> | pause_all_vcpus
>> | (A)set all cpus ->stop=true
>> | and then kick
>> | (B)return if all cpus is paused
>> | (by '->stopped == true'), else
>> | (C)qemu_cond_wait [BQL UNLOCK]
>> 4. be waken up and LOCK BQL | (D)be waken up BUT wait for BQL
>> 5. goto 2. |
>> (BQL is still LOCKed) |
>> resume_all_vcpus |
>> 1. set all cpus ->stop=false |
>> and ->stopped=false |
>> ... |
>> BQL UNLOCK | (E)LOCK BQL
>> | (F)goto B. [but stopped is false now!]
>> |Finally, sleep at step 3 forever.
>>
>>
>> Note: This patch is just for discuss this issue, I'm looking forward to
>> your suggestions, thanks!
>
> Thanks Mike,
>
> the above sketch is really helpful.
>
> I think the problem is not that pause_all_vcpus() is not pausing hard
> enough; the problem is rather than resume_all_vcpus(), when used outside
> vm_start(), should know about the race and do nothing if it happens.
>
> Fortunately resume_all_vcpus does not release the BQL so it should be
> enough to test once; translated to code, this would be the patch to fix it:
>
> diff --git a/cpus.c b/cpus.c
> index b4f8b84b61..1eb7533a91 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
> {
> CPUState *cpu;
>
> + if (!runstate_is_running()) {
> + return;
> + }
> +
Hi Paolo,
The runstate of my above sketch is running, so maybe your patch can fix some
other issues but not mine ?
main_loop_should_exit
( *reset* requested )
pause_all_vcpus
resume_all_vcpus
if (!runstate_check(RUN_STATE_RUNNING) &&
!runstate_check(RUN_STATE_INMIGRATE)) {
runstate_set(RUN_STATE_PRELAUNCH);
...
migration_completion
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
vm_stop ( if runstate_is_running )
do_vm_stop
pause_all_vcpus ( if runstate_is_running )
> qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
> CPU_FOREACH(cpu) {
> cpu_resume(cpu);
>
>
> Thanks,
>
> Paolo
>
> .
>
---
Regards,
Longpeng(Mike)
- [RFC] cpus: avoid get stuck in pause_all_vcpus, Longpeng(Mike), 2020/03/10
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, no-reply, 2020/03/10
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, Paolo Bonzini, 2020/03/12
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus,
Longpeng (Mike, Cloud Infrastructure Service Product Dept.) <=
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, Paolo Bonzini, 2020/03/13
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, Longpeng (Mike, Cloud Infrastructure Service Product Dept.), 2020/03/13
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, Paolo Bonzini, 2020/03/13
- Re: [RFC] cpus: avoid get stuck in pause_all_vcpus, Longpeng (Mike, Cloud Infrastructure Service Product Dept.), 2020/03/13