qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Pavel Dovgalyuk
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Fri, 12 Feb 2016 11:33:23 +0300

> From: Kevin Wolf [mailto:address@hidden
> Am 11.02.2016 um 12:00 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:address@hidden
> > > Am 11.02.2016 um 07:05 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:address@hidden
> > > > > Am 10.02.2016 um 13:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > However, I don't understand yet which layer do you offer as the 
> > > > > > candidate
> > > > > > for record/replay? What functions should be changed?
> > > > > > I would like to investigate this way, but I don't got it yet.
> > > > >
> > > > > At the core, I wouldn't change any existing function, but introduce a
> > > > > new block driver. You could copy raw_bsd.c for a start and then tweak
> > > > > it. Leave out functions that you don't want to support, and add the
> > > > > necessary magic to .bdrv_co_readv/writev.
> > > > >
> > > > > Something like this (can probably be generalised for more than just
> > > > > reads as the part after the bdrv_co_reads() call should be the same 
> > > > > for
> > > > > reads, writes and any other request types):
> > > > >
> > > > > int blkreplay_co_readv()
> > > > > {
> > > > >     BlockReplayState *s = bs->opaque;
> > > > >     int reqid = s->reqid++;
> > > > >
> > > > >     bdrv_co_readv(bs->file, ...);
> > > > >
> > > > >     if (mode == record) {
> > > > >         log(reqid, time);
> > > > >     } else {
> > > > >         assert(mode == replay);
> > > > >         bool *done = req_replayed_list_get(reqid)
> > > > >         if (done) {
> > > > >             *done = true;
> > > > >         } else {
> > > > >             req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > >             qemu_coroutine_yield();
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > /* called by replay.c */
> > > > > int blkreplay_run_event()
> > > > > {
> > > > >     if (mode == replay) {
> > > > >         co = req_completed_list_get(e.reqid);
> > > > >         if (co) {
> > > > >             qemu_coroutine_enter(co);
> > > > >         } else {
> > > > >             bool done = false;
> > > > >             req_replayed_list_insert(reqid, &done);
> > > > >             /* wait synchronously for completion */
> > > > >             while (!done) {
> > > > >                 aio_poll();
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > Where we could consider changing existing code is that it might be
> > > > > desirable to automatically put an instance of this block driver on top
> > > > > of every block device when record/replay is used. If we don't do that,
> > > > > you need to explicitly specify -drive driver=blkreplay,...
> > > >
> > > > As far, as I understand, all synchronous read/write request are also 
> > > > passed
> > > > through this coroutines layer.
> > >
> > > Yes, all read/write requests go through the same function internally, no
> > > matter which external interface was used.
> > >
> > > > It means that every disk access in replay phase should match the 
> > > > recording phase.
> > >
> > > Right. If I'm not mistaken, this was the fundamental requirement you
> > > have, so I wouldn't have suggested this otherwise.
> > >
> > > > Record/replay is intended to be used for debugging and analysis.
> > > > When execution is replayed, guest machine cannot notice analysis 
> > > > overhead.
> > > > Some analysis methods may include disk image reading. E.g., qemu-based
> > > > analysis framework DECAF uses sleuthkit for disk forensics (
> > > https://github.com/sycurelab/DECAF ).
> > > > If similar framework will be used with replay, forensics disk access 
> > > > operations
> > > > won't work if we will record/replay the coroutines.
> > >
> > > Sorry, I'm not sure if I can follow.
> > >
> > > If such analysis software runs in the guest, it's not a replay any more
> > > and I completely fail to see what you're doing.
> > >
> > > If it's a qemu component independent from the guest, then my method
> > > gives you a clean way to bypass the replay driver that wouldn't be
> > > possible with yours.
> >
> > The second one. qemu may be extended with some components that
> > perform guest introspection.
> >
> > > If your plan was to record/replay only async requests and then use sync
> > > requests to bypass the record/replay, let me clearly state that this is
> > > the wrong approach: There are still guest devices which do synchronous
> > > I/O and need to be considered in the replay log, and you shouldn't
> > > prevent the analysis code from using AIO (in fact, using sync I/O in new
> > > code is very much frowned upon).
> >
> > Why do guest synchronous requests have to be recorded?
> > Aren't they completely deterministic?
> 
> Good point. I think you're right in practice. In theory, with dataplane
> (i.e. when running the request in a separate thread) it could happen,
> but I guess that isn't very compatible with replay anyway - and at the
> first sight I couldn't see it performing synchronous requests either.
> 
> > > I can explain in more detail what the block device structure looks like
> > > and how to access an image with and without record/replay, but first let
> > > me please know whether I guessed right what your problem is. Or if I
> > > missed your point, can you please describe in detail a case that
> > > wouldn't work?
> >
> > You have understood it correctly.
> > And what is the solution for bypassing one of the layers from component that
> > should not affect the replay?
> 
> For this, you need to understand how block drivers are stacked in qemu.
> Each driver in the stack has a separate struct BlockDriverState, which
> can be used to access its data. You could hook up things like this:
> 
>       virtio-blk              NBD server
>     --------------           ------------
>           |                        |
>           v                        |
>     +------------+                 |
>     | blkreplay  |                 |
>     +------------+                 |
>           |                        |
>           v                        |
>     +------------+                 |
>     |   qcow2    | <---------------+
>     +------------+
>           |
>           v
>     +------------+
>     | raw-posix  |
>     +------------+
>           |
>           v
>     --------------
>       filesystem
> 
> As you see, what I've chosen for the external analysis interface is just
> an NBD server as this is the component that we already have today. You
> could hook up any other (new) code there; the important part is that it
> doesn't work on the BDS of the blkreplay driver, but directly on the BDS
> of the qcow2 driver.
> 
> On the command line, it could look like this (this assumes that we don't
> add syntactic sugar that creates the blkreplay part automatically - we
> can always do that):
> 
>     -drive file=image.qcow2,if=none,id=img-direct
>     -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
>     -device virtio-blk-pci,drive=img-blkreplay

Are there any hints for driver with these options?
I can't figure out how to create _open function for that.
blkdebug driver seems similar, but it receives image name directly, without 
referencing
the lower level.


Pavel Dovgalyuk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]