qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: QMP (without OOB) function running in thread different from the main


From: Fiona Ebner
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Tue, 2 May 2023 12:03:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

Am 28.04.23 um 18:54 schrieb Juan Quintela:
> Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben:
>>> Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince
>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread?
>>>>
>>>> I believe we're talking about a technicality here. I asked another more
>>>> fundamental question that nobody has answered yet:
>>>>
>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without
>>>> holding the BQL?
>>>
>>> I will say this function starts by bdrv_ (i.e. block layer people) and
>>> endes with _vmstate (i.e. migration people).
>>>
>>> To be honest, I don't know.  That is why I _supposed_ you have an idea.
>>
>> My idea is that bdrv_*() can only be called when you hold the BQL, or
>> for BlockDriverStates in an iothread the AioContext lock.
>>
>> Apparently dropping the BQL in migration code was introduced in Paolo's
>> commit 9b095037527.
> 
> Damn.  I reviewed it, so I am as guilty as the author.
> 10 years later without problems I will not blame that patch.
> 
> I guess we changed something else that broke doing it without the lock.
> 
> But no, I still don't have suggestions/ideas.
>

I do feel like the issue might be very difficult to trigger under normal
circumstances. Depending on the configuration and what you do in the
guest, aio_poll in a vCPU thread does not happen often and I imagine
snapshot-save is also not a super frequent operation for most people. It
still takes me a while to trigger the issue by issuing lots of pflash
writes and running snapshot-save in a loop, I'd guess about 30-60
snapshots. Another reason might be that generated co-wrappers were less
common in the past?

>> I'm not sure what this was supposed to improve in
>> the case of snapshots because the VM is stopped anyway.

Is it? Quoting Juan:> d- snapshots are a completely different beast,
that don't really stop
>    the guest in the same way at that point, and sometimes it shows in
>    this subtle details.

>> Would anything bad happen if we removed the BQL unlock/lock section in
>> qemu_savevm_state() again?
> 
> Dunno.
> 
> For what is worth, I can say that it survives migration-test, but don't
> ask me why/how/...
> 
> Fiona, can you check if it fixes your troubles?
> 

Just removing the single section in qemu_savevm_state() breaks even the
case where snapshot_save_job_bh() is executed in the main thread,
because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl()
which asserts that it's not already locked.

Also removing the lock/unlock pair in ram_init_bitmaps() seems to work.
But I'm not sure what else a full semantic revert of commit 9b095037527
would entail.

Best Regards,
Fiona




reply via email to

[Prev in Thread] Current Thread [Next in Thread]