[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: QMP (without OOB) function running in thread different from the main
From: |
Fiona Ebner |
Subject: |
Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll |
Date: |
Tue, 2 May 2023 12:03:41 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
Am 28.04.23 um 18:54 schrieb Juan Quintela:
> Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben:
>>> Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince
>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread?
>>>>
>>>> I believe we're talking about a technicality here. I asked another more
>>>> fundamental question that nobody has answered yet:
>>>>
>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without
>>>> holding the BQL?
>>>
>>> I will say this function starts by bdrv_ (i.e. block layer people) and
>>> endes with _vmstate (i.e. migration people).
>>>
>>> To be honest, I don't know. That is why I _supposed_ you have an idea.
>>
>> My idea is that bdrv_*() can only be called when you hold the BQL, or
>> for BlockDriverStates in an iothread the AioContext lock.
>>
>> Apparently dropping the BQL in migration code was introduced in Paolo's
>> commit 9b095037527.
>
> Damn. I reviewed it, so I am as guilty as the author.
> 10 years later without problems I will not blame that patch.
>
> I guess we changed something else that broke doing it without the lock.
>
> But no, I still don't have suggestions/ideas.
>
I do feel like the issue might be very difficult to trigger under normal
circumstances. Depending on the configuration and what you do in the
guest, aio_poll in a vCPU thread does not happen often and I imagine
snapshot-save is also not a super frequent operation for most people. It
still takes me a while to trigger the issue by issuing lots of pflash
writes and running snapshot-save in a loop, I'd guess about 30-60
snapshots. Another reason might be that generated co-wrappers were less
common in the past?
>> I'm not sure what this was supposed to improve in
>> the case of snapshots because the VM is stopped anyway.
Is it? Quoting Juan:> d- snapshots are a completely different beast,
that don't really stop
> the guest in the same way at that point, and sometimes it shows in
> this subtle details.
>> Would anything bad happen if we removed the BQL unlock/lock section in
>> qemu_savevm_state() again?
>
> Dunno.
>
> For what is worth, I can say that it survives migration-test, but don't
> ask me why/how/...
>
> Fiona, can you check if it fixes your troubles?
>
Just removing the single section in qemu_savevm_state() breaks even the
case where snapshot_save_job_bh() is executed in the main thread,
because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl()
which asserts that it's not already locked.
Also removing the lock/unlock pair in ram_init_bitmaps() seems to work.
But I'm not sure what else a full semantic revert of commit 9b095037527
would entail.
Best Regards,
Fiona
- Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll,
Fiona Ebner <=