qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 17/42] job: Move defer_to_main_loop to Job


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH 17/42] job: Move defer_to_main_loop to Job
Date: Tue, 15 May 2018 14:17:25 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 14.05.2018 um 17:52 hat Max Reitz geschrieben:
> On 2018-05-09 18:26, Kevin Wolf wrote:
> > Signed-off-by: Kevin Wolf <address@hidden>
> > ---
> >  include/block/blockjob.h     |  5 ----
> >  include/block/blockjob_int.h | 19 ---------------
> >  include/qemu/job.h           | 20 ++++++++++++++++
> >  block/backup.c               |  7 +++---
> >  block/commit.c               | 11 +++++----
> >  block/mirror.c               | 15 ++++++------
> >  block/stream.c               | 14 +++++------
> >  blockjob.c                   | 57 
> > ++++----------------------------------------
> >  job.c                        | 33 +++++++++++++++++++++++++
> >  tests/test-bdrv-drain.c      |  7 +++---
> >  tests/test-blockjob-txn.c    | 13 +++++-----
> >  tests/test-blockjob.c        |  7 +++---
> >  12 files changed, 98 insertions(+), 110 deletions(-)
> 
> [...]
> 
> > diff --git a/block/commit.c b/block/commit.c
> > index 85baea8f92..d326766e4d 100644
> > --- a/block/commit.c
> > +++ b/block/commit.c
> > @@ -72,9 +72,10 @@ typedef struct {
> >      int ret;
> >  } CommitCompleteData;
> >  
> > -static void commit_complete(BlockJob *job, void *opaque)
> > +static void commit_complete(Job *job, void *opaque)
> >  {
> > -    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
> > +    CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
> 
> Now this is just abuse.
> 
> ...but it's not the first time someone packs two container_of() into
> one, it appears.  So, whatever, I guess.

I don't think it's abuse. Why wouldn't I directly cast to the type that
I really want instead of casting to each of the uninteresting parent
classes, too?

> > @@ -170,3 +171,35 @@ void job_unref(Job *job)
> >          g_free(job);
> >      }
> >  }
> > +
> > +typedef struct {
> > +    Job *job;
> > +    JobDeferToMainLoopFn *fn;
> > +    void *opaque;
> > +} JobDeferToMainLoopData;
> > +
> > +static void job_defer_to_main_loop_bh(void *opaque)
> > +{
> > +    JobDeferToMainLoopData *data = opaque;
> > +    Job *job = data->job;
> > +    AioContext *aio_context = job->aio_context;
> > +
> > +    /* Prevent race with job_defer_to_main_loop() */
> > +    aio_context_acquire(aio_context);
> 
> I don't have a good feeling about this.  The original code had this
> comment above an aio_context_acquire() for a context that might
> decidedly not have anything to do with the BB's context;
> block_job_defer_to_main_loop()'s description was that it would acquire
> the latter, so why did it acquire the former at all?
> 
> We wouldn't need this comment here at all, because acquiring this
> AioContext is part of the interface.  That's why I don't have a good
> feeling.

To be honest, I don't fully understand what the old code was trying to
do or what race it was talking about, because I don't see any potential
race with job_defer_to_main_loop() (neither in the old nor in the new
code).

My suspicion is that Stefan solved the race that you reported [1] (and
which was not with job_defer_to_main_loop(), but with random code that
runs between that and the BH) just by adding some more code that made
the scenario safe, but missed that this also made the existing code
redundant. In other words, I think taking data->aio_context didn't serve
a purpose even in the old code.

The only thing it could possibly protect is the access of data->job->bs,
but that's not something that changes between job_defer_to_main_loop()
and the execution of the BH.

[1] https://lists.gnu.org/archive/html/qemu-devel/2014-10/msg00260.html

> The best explanation I can come up with is that the original code
> acquired the AioContext both of the block device at the time of the BH
> (because that needs to be done), and at the time of
> block_job_defer_to_main_loop() -- because the latter is probably the
> context the block_job_defer_to_main_loop() call came from, so it should
> be (b)locked.
> 
> But if that's the case, then the same should be done here.  The context
> of the job may change between scheduling the BH and the BH being
> executed, so we might lock a different context here than the one
> job_defer_to_main_loop() ran in (i.e., job->aio_context at the time of
> job_defer_to_main_loop() running).  And maybe we should lock that old
> context, too -- just like block_job_defer_to_main_loop_bh() did.

Why should we lock the old context? We don't access anything protected
by it. Even the data->job->bs access has gone away because we now have
job->aio_context.

Kevin

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]