qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-comp


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
Date: Thu, 8 Apr 2021 19:58:56 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0

08.04.2021 19:20, Max Reitz wrote:
block-job-complete can only be applied when the job is READY, not when
it is on STANDBY (ready, but paused).  Draining a job technically pauses
it (which makes a READY job enter STANDBY), and ending the drained
section does not synchronously resume it, but only schedules the job,
which will then be resumed.  So attempting to complete a job immediately
after a drained section may sometimes fail.

That is bad at least because users cannot really work nicely around
this: A job may be paused and resumed at any time, so waiting for the
job to be in the READY state and then issuing a block-job-complete poses
a TOCTTOU problem.  The only way around it would be to issue
block-job-complete until it no longer fails due to the job being in the
STANDBY state, but that would not be nice.

We can solve the problem by allowing block-job-complete to be invoked on
jobs that are on STANDBY, if that status is the result of a drained
section (not because the user has paused the job), and that section has
ended.  That is, if the job is on STANDBY, but scheduled to be resumed.

Perhaps we could actually just directly allow this, seeing that mirror
is the only user of ready/complete, and that mirror_complete() could
probably work under the given circumstances, but there may be many side
effects to consider.

It is simpler to add a function job_wait_unpaused() that waits for the
job to be resumed (under said circumstances), and to make
qmp_block_job_complete() use it to delay job_complete() until then.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
  include/qemu/job.h | 15 +++++++++++++++
  blockdev.c         |  3 +++
  job.c              | 42 ++++++++++++++++++++++++++++++++++++++++++
  3 files changed, 60 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index efc6fa7544..cf3082b6d7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
   */
  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
+/**
+ * If the job has been paused because of a drained section, and that
+ * section has ended, wait until the job is resumed.
+ *
+ * Return 0 if the job is not paused, or if it has been successfully
+ * resumed.
+ * Return an error if the job has been paused in such a way that
+ * waiting will not resume it, i.e. if it has been paused by the user,
+ * or if it is still drained.
+ *
+ * Callers must be in the home AioContext and hold the AioContext lock
+ * of job->aio_context.
+ */
+int job_wait_unpaused(Job *job, Error **errp);
+
  #endif
diff --git a/blockdev.c b/blockdev.c
index a57590aae4..c0cc2fa364 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
          return;
      }
+ if (job_wait_unpaused(&job->job, errp) < 0) {
+        return;
+    }
      trace_qmp_block_job_complete(job);
      job_complete(&job->job, errp);
      aio_context_release(aio_context);
diff --git a/job.c b/job.c
index 289edee143..1ea30fd294 100644
--- a/job.c
+++ b/job.c
@@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, 
Error **errp), Error **errp)
      job_unref(job);
      return ret;
  }
+
+int job_wait_unpaused(Job *job, Error **errp)
+{
+    /*
+     * Only run this function from the main context, because this is
+     * what we need, and this way we do not have to think about what
+     * happens if the user concurrently pauses the job from the main
+     * monitor.
+     */
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+    /*
+     * Quick path (e.g. so we do not get an error if pause_count > 0
+     * but the job is not even paused)
+     */
+    if (!job->paused) {
+        return 0;
+    }
+
+    /* If the user has paused the job, waiting will not help */
+    if (job->user_paused) {
+        error_setg(errp, "Job '%s' has been paused by the user", job->id);
+        return -EBUSY;
+    }
+
+    /* Similarly, if the job is still drained, waiting will not help either */
+    if (job->pause_count > 0) {
+        error_setg(errp, "Job '%s' is blocked and cannot be unpaused", 
job->id);
+        return -EBUSY;
+    }
+
+    /*
+     * This function is specifically for waiting for a job to be
+     * resumed after a drained section.  Ending the drained section
+     * includes a job_enter(), which schedules the job loop to be run,
+     * and once it does, job->paused will be cleared.  Therefore, we
+     * do not need to invoke job_enter() here.
+     */
+    AIO_WAIT_WHILE(job->aio_context, job->paused);
+
+    return 0;
+}


Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and 
the period when pause_count is 0 but paused is still true should be relatively 
shot. And patch doesn't help if user call job-complete during drained section. So 
it looks like the patch will help relatively seldom.. Or I'm missing something?

job-complete command is async. Can we instead just add a boolean like 
job->completion_requested, and set it if job-complete called in STANDBY state, 
and on job_resume job_complete will be called automatically if this boolean is 
true?

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]