qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image si


From: Stefano Garzarella
Subject: Re: [Qemu-devel] [PATCH v2] block/rbd: increase dynamically the image size
Date: Wed, 8 May 2019 11:41:02 +0200
User-agent: NeoMutt/20180716

On Tue, May 07, 2019 at 11:43:50AM +0200, Kevin Wolf wrote:
> Am 06.05.2019 um 11:50 hat Stefano Garzarella geschrieben:
> > On Fri, May 03, 2019 at 01:21:23PM -0400, Jason Dillaman wrote:
> > > On Fri, May 3, 2019 at 12:30 PM Stefano Garzarella <address@hidden> wrote:
> > > >
> > > > RBD APIs don't allow us to write more than the size set with
> > > > rbd_create() or rbd_resize().
> > > > In order to support growing images (eg. qcow2), we resize the
> > > > image before write operations that exceed the current size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <address@hidden>
> > > > ---
> > > > v2:
> > > >   - use bs->total_sectors instead of adding a new field [Kevin]
> > > >   - resize the image only during write operation [Kevin]
> > > >     for read operation, the bdrv_aligned_preadv() already handles reads
> > > >     that exceed the length returned by bdrv_getlength(), so IMHO we can
> > > >     avoid to handle it in the rbd driver
> > > > ---
> > > >  block/rbd.c | 14 +++++++++++++-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > index 0c549c9935..613e8f4982 100644
> > > > --- a/block/rbd.c
> > > > +++ b/block/rbd.c
> > > > @@ -934,13 +934,25 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState 
> > > > *bs,
> > > >      }
> > > >
> > > >      switch (cmd) {
> > > > -    case RBD_AIO_WRITE:
> > > > +    case RBD_AIO_WRITE: {
> > > > +        /*
> > > > +         * RBD APIs don't allow us to write more than actual size, so 
> > > > in order
> > > > +         * to support growing images, we resize the image before write
> > > > +         * operations that exceed the current size.
> > > > +         */
> > > > +        if (off + size > bs->total_sectors * BDRV_SECTOR_SIZE) {
> > > 
> > > When will "bs->total_sectors" be refreshed to represent the correct
> > > current size? You wouldn't want a future write whose extent was
> > > greater than the original image size but less then a previous IO that
> > > expanded the image to attempt to shrink the image.
> > > 
> > 
> > Good point!
> > IIUC it can happen, because in the bdrv_aligned_pwritev() we do these
> > steps:
> > 1. call bdrv_driver_pwritev() that invokes "drv->bdrv_aio_pwritev" and
> >    then it waits calling "qemu_coroutine_yield()"
> > 2. call bdrv_co_write_req_finish() that updates the "bs->total_sectors"
> > 
> > Between steps 1 and 2, maybe another request can be executed, then the
> > issue that you described can occur.
> > 
> > The solutions that I have in mind are:
> > a. Add a variable in the BDRVRBDState to track the latest resize.
> 
> This would work and be relatively simple.
> 
> > b. Call rbd_get_size() before the rbd_resize() to be sure to avoid to shrink
> >    the image.
> 
> I'm not sure if rbd_get_size() involves network traffic or other
> significant complexity. If so, I'd definitely avoid it.
> 
> > c. Updates the "bs->total_sectors" after the rbd_resize(), but I'm not
> >    sure it is allowed.
> > 
> > @Jason, @Kevin Do you have any advice?
> 
> We need to make sure to run everything that bdrv_co_write_req_finish()
> does for resizing an image:
> 
>     bs->total_sectors = end_sector;
>     bdrv_parent_cb_resize(bs);
>     bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS);
> 
> Just duplicating that code wouldn't be good; if something is added, we'd
> probably forget updating rbd, too. So I think your solution c would at
> least involve refactoring the above code into a separate function that
> can be called from rbd.
> 
> But solution a might actually be the simplest. In this case, sorry for
> giving you bad advice in v1 of the patch.
> 

I agree with you, 'a' should be simplest to implement.

I'll send a v3 fixing this.

Thanks,
Stefano



reply via email to

[Prev in Thread] Current Thread [Next in Thread]