qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] cpus: avoid get stuck in pause_all_vcpus


From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Subject: Re: [RFC] cpus: avoid get stuck in pause_all_vcpus
Date: Fri, 13 Mar 2020 09:43:44 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1


On 2020/3/12 23:28, Paolo Bonzini wrote:
> On 10/03/20 10:14, Longpeng(Mike) wrote:
>> From: Longpeng <address@hidden>
>>
>> We find an issue when repeat reboot in guest during migration, it cause the
>> migration thread never be waken up again.
>>
>> <main loop>                        |<migration_thread>
>>                                    |
>> LOCK BQL                           |
>> ...                                |
>> main_loop_should_exit              |
>>  pause_all_vcpus                   |
>>   1. set all cpus ->stop=true      |
>>      and then kick                 |
>>   2. return if all cpus is paused  |
>>      (by '->stopped == true'), else|
>>   3. qemu_cond_wait [BQL UNLOCK]   |
>>                                    |LOCK BQL
>>                                    |...
>>                                    |do_vm_stop
>>                                    | pause_all_vcpus
>>                                    |  (A)set all cpus ->stop=true
>>                                    |     and then kick
>>                                    |  (B)return if all cpus is paused
>>                                    |     (by '->stopped == true'), else
>>                                    |  (C)qemu_cond_wait [BQL UNLOCK]
>>   4. be waken up and LOCK BQL      |  (D)be waken up BUT wait for  BQL
>>   5. goto 2.                       |
>>  (BQL is still LOCKed)             |
>>  resume_all_vcpus                  |
>>   1. set all cpus ->stop=false     |
>>      and ->stopped=false           |
>> ...                                |
>> BQL UNLOCK                         |  (E)LOCK BQL
>>                                    |  (F)goto B. [but stopped is false now!]
>>                                    |Finally, sleep at step 3 forever.
>>
>>
>> Note: This patch is just for discuss this issue, I'm looking forward to
>>       your suggestions, thanks!
> 
> Thanks Mike,
> 
> the above sketch is really helpful.
> 
> I think the problem is not that pause_all_vcpus() is not pausing hard
> enough; the problem is rather than resume_all_vcpus(), when used outside
> vm_start(), should know about the race and do nothing if it happens.
> 
> Fortunately resume_all_vcpus does not release the BQL so it should be
> enough to test once; translated to code, this would be the patch to fix it:
> 
> diff --git a/cpus.c b/cpus.c
> index b4f8b84b61..1eb7533a91 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
>  {
>      CPUState *cpu;
> 
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
Hi Paolo,

The runstate of my above sketch is running, so maybe your patch can fix some
other issues but not mine ?

main_loop_should_exit
  ( *reset* requested )
  pause_all_vcpus
  resume_all_vcpus
  if (!runstate_check(RUN_STATE_RUNNING) &&
          !runstate_check(RUN_STATE_INMIGRATE)) {
      runstate_set(RUN_STATE_PRELAUNCH);
  ...


migration_completion
  vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
    vm_stop ( if runstate_is_running )
      do_vm_stop
        pause_all_vcpus ( if runstate_is_running )


>      qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
>      CPU_FOREACH(cpu) {
>          cpu_resume(cpu);
> 
> 
> Thanks,
> 
> Paolo
> 
> .
> 

---
Regards,
Longpeng(Mike)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]