qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v13 19/25] replay: add BH oneshot event for bloc


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v13 19/25] replay: add BH oneshot event for block layer
Date: Tue, 5 Mar 2019 10:52:38 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

Am 04.03.2019 um 13:17 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 21.02.2019 um 12:05 hat Pavel Dovgalyuk geschrieben:
> > > Replay is capable of recording normal BH events, but sometimes
> > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > function. This patch enables recording and replaying such callbacks.
> > > Block layer uses these events for calling the completion function.
> > > Replaying these calls makes the execution deterministic.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <address@hidden>
> > >
> > > --
> > >
> > > v6:
> > >  - moved stub function to the separate file for fixing linux-user build
> > > v10:
> > >  - replaced all block layer aio_bh_schedule_oneshot calls
> > This still doesn't catch all instances, e.g. everything that goes
> > through aio_co_schedule() is missing.
> 
> It seems, that everything else is synchronized with blkreplay driver
> which is mandatory when using block devices in rr mode.

Ah, yes, this is a good point. blkreplay goes through
replay_block_event(), which is where things get synchronised, right?

Does this mean that most of the places where you replaced a BH with your
new function don't actually need it either because they are called
through blkreplay and will go through replay_block_event() before
reaching the guest?

> > But I fully expect this to get broken anyway all the time because nobody
> > understands which function to use, and if it works for your special case
> > now and we'll fix other stuff as you encouter it, maybe that's good
> > enough for you.
> 
> This problem exists in every subsystem and it is ok for now, when
> record/replay is not mature enough, and not familiar for others.  When
> virtual devices are updated, developers may miss correct loadvm/savevm
> implementation. For example, loading the audio device state may miss
> shift the phase of the output signal. Nobody will notice that bug in
> the migration process, but it reveals when we use record/replay.
> 
> We can't cover everything with record/replay tests. Most of the new
> bugs can be revealed in complex configurations after billions of
> executed instructions.  But when this feature will be available out of
> the box, we'll at least get more smoke testing.

Ok.

> > > @@ -1349,8 +1351,8 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
> > > int64_t offset, int
> > bytes,
> > >
> > >      acb->has_returned = true;
> > >      if (acb->rwco.ret != NOT_DONE) {
> > > -        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
> > > -                                blk_aio_complete_bh, acb);
> > > +        replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> > > +                                         blk_aio_complete_bh, acb);
> > >      }
> > 
> > This, and a few other places that you convert, are in fast paths and add
> > some calls that are unnecessary for non-replay cases.
> 
> I don't think that this can make a noticeable slowdown, but we can run
> the tests if you want.
> We have the test suite which performs disk-intensive computation.
> It was created to measure the effect of running BH callbacks through
> the virtual timer infrastructure.

I think this requires quite fast storage to possibly make a difference.
Or if you don't have that, maybe a ramdisk or even a null-co:// backend
could do the trick. Maybe null-co:// is actually the best option.

Anyway, if it's not too much work for you, running some tests would be
good.

> > I wonder if we could make replay optional in ./configure and then make
> > replay_bh_schedule_oneshot_event() a static inline function that can get
> > optimised away at compile time if the feature is disabled.
> 
> It is coupled with icount. However, some icount calls are also lie on
> the fast paths and are completely useless when icount is not enabled.

Well, the common fast path is KVM, which doesn't have icount at all, so
that might make it less critical. :-)

I get your point, though maybe that just means that both should be
possible to be disabled at configure time.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]