qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [Qemu-devel] [PATCH] job: Fix nested aio_poll() hangin


From: Eric Blake
Subject: Re: [Qemu-stable] [Qemu-devel] [PATCH] job: Fix nested aio_poll() hanging in job_txn_apply
Date: Tue, 21 Aug 2018 11:15:21 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 08/21/2018 01:45 AM, Fam Zheng wrote:
All callers have acquired ctx already. Doing that again results in
aio_poll() hang. This fixes the problem that a BDRV_POLL_WHILE() in the
callback cannot make progress because ctx is recursively locked, for
example, when drive-backup finishes.

Cc: address@hidden
Reported-by: Gu Nini <address@hidden>
Signed-off-by: Fam Zheng <address@hidden>
---
  job.c | 18 +++++-------------
  1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/job.c b/job.c
index fa671b431a..b07c5d0ffc 100644
--- a/job.c
+++ b/job.c
@@ -136,21 +136,13 @@ static void job_txn_del_job(Job *job)
      }
  }
-static int job_txn_apply(JobTxn *txn, int fn(Job *), bool lock)
+static int job_txn_apply(JobTxn *txn, int fn(Job *))

Is it worth adding a comment here that callers must have already acquired the job context?

However, I was unable to quickly audit whether all callers really did have the lock (it balloons into whether all callers of job_finalize() have the lock), so I'm reluctant to give R-b.

@@ -857,10 +849,10 @@ static void job_completed_txn_success(Job *job)
          assert(other_job->ret == 0);
      }
- job_txn_apply(txn, job_transition_to_pending, false);
+    job_txn_apply(txn, job_transition_to_pending);
/* If no jobs need manual finalization, automatically do so */
-    if (job_txn_apply(txn, job_needs_finalize, false) == 0) {
+    if (job_txn_apply(txn, job_needs_finalize) == 0) {
          job_do_finalize(job);

That said, the change makes sense here: since the first direct call to job_txn_apply() did not need to lock, why should the second indirect call through job_do_finalize() need it?

Or, is the fix to have job_do_finalize() gain a bool parameter, where job_completed_txn_success() would pass false to that parameter, while job_finalize() would pass true (again, going back to auditing whether all callers of job_finalize() have already acquired the context).

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]