qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3 v4] Expose a mechanisem to trace block write


From: Liran Schour
Subject: Re: [Qemu-devel] [PATCH 1/3 v4] Expose a mechanisem to trace block writes
Date: Thu, 22 Oct 2009 13:01:33 +0200

address@hidden wrote on 21/10/2009
20:21:09:

> Anthony Liguori <address@hidden>
> Sent by: address@hidden
>
> 21/10/2009 20:21
>
> To
>
> Liran Schour/Haifa/address@hidden
>
> cc
>
> address@hidden
>
> Subject
>
> Re: [Qemu-devel] [PATCH 1/3 v4] Expose a mechanisem to trace block writes
>
> Hi Liran,
>
> address@hidden wrote:
> > To support live migration without shared storage we need to be able to
trace
> > writes to disk while migrating. This Patch expose handler registration
for
> > above components to be notified about block writes.
> >
> > diff --git a/block.c b/block.c
> > index 33f3d65..bf5f7a6 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first;
> >
> >  static BlockDriver *first_drv;
> >
> > +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL;
> > +
> >
>
> Should be a property of a BlockDriverState.  IOW, we should register a
> dirty callback for each block device we're interested in.

Agree, I will fix that.

> >  int path_is_absolute(const char *path)
> >  {
> >      const char *p;
> > @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t
> sector_num,
> >      if (bdrv_check_request(bs, sector_num, nb_sectors))
> >          return -EIO;
> >
> > +    if(bdrv_dirty_handler != NULL) {
> > +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> > +    }
> > +
> >      return drv->bdrv_write(bs, sector_num, buf, nb_sectors);
> >  }
> >
>
> CodingStyle seems off.
>
> We have to be careful in these cases to check for whether we're dealing
> with BDRV_FILE.  In the case of something like qcow2, you would get two
> dirty callbacks as this code stands.  The first would be what the guest
> actually writes and then the second (and potentially third) would be
> qcow2 metadata updates along with writing the actual data to disk.

It seems to me that if we will register a callback for each device we are
interested in, it will solve the problem. (in the code now bug is avoided
by
check that I do in block-migration.c about the device type and name. But I
agree that right now it is buggy.)
Will register each device separately to solve the problem.

> In terms of an interface, I think it would be better to register a
> bitmap and to poll the block driver periodically to see which bits have
> changed.  This is how ram dirty tracking works and I think keeping these
> interfaces consistent is a good thing.

There are advantages for let the higher component to manage it's own dirty
tracking mechanism. In this way more then one component can register
itself.
I think to change the registration mechanism to allow more then one handler
to be
registered, the same way like register_savevm_live works.

> I'd suggest tracking dirtiness in relatively large chunks (at least 2MB).

There are disadvantages for tracking dirtiness in such large chunks. One
write
of 4KB to disk will result in 2MB migrated data.
For now I see a way to improve things by allocating the bitmap only when
migration
started and not from the beginning. I will fix that.

> >
> > @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev
> (BlockDriverState *bs, int64_t sector_num,
> >      if (bdrv_check_request(bs, sector_num, nb_sectors))
> >          return NULL;
> >
> > +    if(bdrv_dirty_handler != NULL) {
> > +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> > +    }
> > +
> >      ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
> >                                 cb, opaque);
> >
>
> The check should be in the completion callback as AIO requests can be
> canceled.  You're potentially giving a false positive.

The problem is that the completion callback is caller private. The only way
is to intercept the code that is calling the cb, but this code is format
specific
- means many places in the source code. For our reason I think that we can
live
with treating a write that was cancel as a dirty block. What do you think?

I will fix all of the above and will resend the patch.

Thanks for the review.
- Liran





reply via email to

[Prev in Thread] Current Thread [Next in Thread]