qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex


From: Emanuele Giuseppe Esposito
Subject: Re: [RFC PATCH 0/6] job: replace AioContext lock with job_mutex
Date: Mon, 12 Jul 2021 10:41:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1



On 08/07/2021 15:04, Stefan Hajnoczi wrote:
On Thu, Jul 08, 2021 at 01:32:12PM +0200, Paolo Bonzini wrote:
On 08/07/21 12:36, Stefan Hajnoczi wrote:
What is very clear from this patch is that it
is strictly related to the brdv_* and lower level calls, because
they also internally check or even use the aiocontext lock.
Therefore, in order to make it work, I temporarly added some
aiocontext_acquire/release pair around the function that
still assert for them or assume they are hold and temporarly
unlock (unlock() - lock()).

Sounds like the issue is that this patch series assumes AioContext locks
are no longer required for calling the blk_*()/bdrv_*() APIs? That is
not the case yet, so you had to then add those aio_context_lock() calls
back in elsewhere. This approach introduces unnecessary risk. I think we
should wait until blk_*()/bdrv_*() no longer requires the caller to hold
the AioContext lock before applying this series.

In general I'm in favor of pushing the lock further down into smaller and
smaller critical sections; it's a good approach to make further audits
easier until it's "obvious" that the lock is unnecessary.  I haven't yet
reviewed Emanuele's patches to see if this is what he's doing where he's
adding the acquire/release calls, but that's my understanding of both his
cover letter and your reply.

The problem is the unnecessary risk. We know what the goal is for
blk_*()/bdrv_*() but it's not quite there yet. Does making changes in
block jobs help solve the final issues with blk_*()/bdrv_*()?

Correct me if I am wrong, but it seems to me that the bdrv_*()/blk_*() operation mostly take care of building, modifying and walking the bds graph. So since graph nodes can have multiple AioContext, it makes sense that we have a lock when modifying the graph, right?

If so, we can simply try to replace the AioContext lock with a graph lock, or something like that. But I am not sure of this.

Emanuele

If yes, then it's a risk worth taking. If no, then spending time
developing interim code, reviewing those patches, and risking breakage
doesn't seem worth it. I'd rather wait for blk_*()/bdrv_*() to be fully
complete and then see patches that delete aio_context_acquire() in most
places or add locks in the remaining places where the caller was relying
on the AioContext lock.

Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]