qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use blo


From: John Snow
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context
Date: Thu, 6 Oct 2016 16:22:33 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0



On 10/05/2016 10:02 AM, Kevin Wolf wrote:
Am 01.10.2016 um 00:00 hat John Snow geschrieben:
There are a few places where we're fishing it out for ourselves.
Let's not do that and instead use the helper.

Signed-off-by: John Snow <address@hidden>

That change makes a difference when the block job is running its
completion part after block_job_defer_to_main_loop(). The commit message
could be more explicit about whether this is intentional or whether this
case is expected to happen at all.

I suspect that if it can happen, this is a bug fix. Please check and
update the commit message accordingly.


Because I'm bad with being concise, I wrote a TLDR at the bottom. Otherwise, enjoy this wall of text.

Kevin


Intentional under the premise of:

(1) Acquiring the context for which a job is not actually running under is likely incorrect (or at the very least misleading), and

(2) If using the main thread context for any would-be callers is incorrect, this is a problem with the job lifetime that needs to be corrected anyway.


In general, if we are acquiring the context to secure exclusive access to the BlockJob state itself, using the getter here is perfectly safe. If we are acquiring context for other reasons, we need to consider more carefully.


The callers are:

(A) bdrv_drain_all (block/io)

Obtains context for the sake of pause/resume. Pauses all jobs before draining all BDSes. For starters, pausing a job that has deferred to main has no effect (and neither does resuming). This usage appears slightly erroneous, though, in that if we are not running from the main thread, we are definitely not securing exclusive rights to the block job. We could, in theory, race on reads/writes to the pause count field. This would be a bugfix.

(B) find_block_job (all monitor context)

        Acquires context as a courtesy for its callers:
        - qmp_block_job_set_speed
        - qmp_block_job_cancel
        - qmp_block_job_pause
        - qmp_block_job_resume
        - qmp_block_job_complete

In an "already deferred to main" sense... in general, if the job has already deferred to main we don't need to acquire the block's context to get safe access to the job, because we're already running in the main context. Further, none of these functions actually have any meaning for a job in such a state.

        - set_speed: Sets speed parameters, harmless either way.
- cancel: Will set the cancelled boolean, reset iostatus, then attempt to enter the job. Since a job that enters the main loop remains busy, the enter is a NOP. The BlockBackend AIO context here is therefore extraneous, and the getter is safe.
        - pause: Only increments a counter, and will have no effect.
- resume: Decrements a counter. Attempts to enter(), but as stated above this is a NOP. - complete: Calls .complete(), for which the only implementation is mirror_complete. Uh, this actually seems messy. Looks like there's nothing to prevent us from calling this after we've already told it to complete once. This could be a legitimate bug that this patch does nothing in particular to address. If complete() is shored up such that it can be called precisely once, this becomes safe.

(C) qmp_query_block_jobs (monitor context)

        Just a getter. Using get_context is safe in either state.

(D) run_block_job (qemu-img)

Never called when the context is in the main loop anyway. Effectively no change here.



So, with the exception of .complete, I think this is a safe change as it stands... However... Paolo wants to complicate my life and get rid of this getter for his own fiendish purposes. He suggests pushing down context acquisition into blockjob.c directly for any QMP callers:

- qmp_block_job_set_speed -> block_job_set_speed
- qmp_block_job_cancel -> block_job_cancel
- qmp_block_job_pause -> block_job_user_pause
- qmp_block_job_resume -> block_job_user_resume
- qmp_block_job_complete -> block_job_complete
- qmp_query_block_jobs -> block_job_query

Most of these have only one caller in the QMP layer:

block_job_set_speed
block_job_user_pause
block_job_user_resume
block_job_query

These can easily just take the context they need, removing external uses of job->blk for purposes of acquiring the context.

block_job_cancel and block_job_complete are different.

block_job_cancel is called in many places, but we can just add a similar block_job_user_cancel if we wanted a version which takes care to acquire context and one that does not. (Or we could just acquire the context regardless, but Paolo warned me ominously that recursive locks are EVIL. He sounded serious.)

block_job_complete has no direct callers outside of QMP, but it is also used as a callback by block_job_complete_sync, used in qemu-img for run_block_job. I can probably rewrite qemu_img to avoid this usage.



TLDR:
- This change should be perfectly safe, but Paolo wants to get rid of this usage anyway.
- At least 5/6 uses of external context grabbing can be internalized easily.
- qemu-img's run_block_job needs to be refactored a bit, though I don't have an idea for that yet, but as you pointed out it needs to be done for the public/private split anyway.
- block_job_complete needs to be touched up no matter what we do.
- The aio_context getter can probably be removed entirely as per Paolo's wishes, but I'll have to change bdrv_drain_all a bit. A block_job_pause_all and block_job_resume all would work, though that's a bit special purpose. I could craft up a block_job_apply_all for the purpose instead. (e.g. block_job_apply_all(block_job_pause))

I think that answers everyone's questions...
--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]