qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RBD images and exclusive locking


From: Ilya Dryomov
Subject: Re: RBD images and exclusive locking
Date: Fri, 25 Mar 2022 18:10:52 +0100

On Fri, Mar 25, 2022 at 5:30 PM Peter Lieven <pl@kamp.de> wrote:
>
>
>
> > Am 25.03.2022 um 16:29 schrieb Ilya Dryomov <idryomov@gmail.com>:
> >
> > On Fri, Mar 25, 2022 at 12:35 PM Peter Lieven <pl@kamp.de> wrote:
> >>
> >>> Am 24.03.22 um 15:53 schrieb Peter Lieven:
> >>> Am 24.03.22 um 12:51 schrieb Peter Lieven:
> >>>> Hi,
> >>>>
> >>>>
> >>>> following the thread on the ceph ml about rbd and exclusive locks I was 
> >>>> wondering if we should rewrite the rbd driver to
> >>>>
> >>>> try to acquire an exclusive lock on open and not magically on the first 
> >>>> write. This way we could directly terminate qemu
> >>>>
> >>>> if the rbd image is in use like it is done with ordinary image files on 
> >>>> a filesystem for some time now.
> >>>
> >>>
> >>> Digging a bit into the code and testing it seems that the exclusive-lock 
> >>> on rbd image is not that exclusive. We can easily start several
> >>>
> >>> qemu instances accessing the same image. Watching the locks on the image, 
> >>> the lock owner of the one exclusive lock
> >>>
> >>> toggles between the instances. This has potentitial for severe corruption.
> >>>
> >>> I am thinking not of the case where a qemu instance gets killed or host 
> >>> dies. I am thinking of network issues where the disconnected
> >>>
> >>> old instance of a VM suddenly kicks back in.
> >>>
> >>>
> >>> However, if I manually acquire an exclusive lock on qemu_rbd_open_image 
> >>> all other callers get EROFS.
> >>>
> >>> So whats the difference between the exclusive lock that a write request 
> >>> obtains and an exclusive lock created
> >>>
> >>> by rbd_lock_acquire? From the rbd lock ls perspective they are identical.
> >>>
> >>>
> >>> Best
> >>>
> >>> Peter
> >>>
> >>
> >> To whom it may concern this is my proof of concept code to add safe 
> >> locking.
> >>
> >> Tested with live migration incl. migration from an old version without the 
> >> locking support.
> >>
> >> rbd_lock_acquire can take the lock that has been installed on the first 
> >> write.
> >>
> >>
> >> Considerations:
> >>
> >> 1) add switch to enable/disable locking if someone mounts the image 
> >> multiple times on purpose (default on?)
> >>
> >> 2) we might need to check if we already have the lock in 
> >> bdrv_invalidate_cache
> >>
> >> 3) I am aware of bdrv_{check,set,abort}_perm. If this are the better hooks 
> >> how to use them with only a single lock?
> >>
> >> 4) rbd_lock_release is automatically invoked on image close.
> >>
> >>
> >> Happy weekend
> >>
> >> Peter
> >>
> >>
> >> ---8<---
> >>
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 240b267c27..13a597e526 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1047,6 +1047,24 @@ static int qemu_rbd_open(BlockDriverState *bs, 
> >> QDict *options, int flags,
> >>         goto failed_open;
> >>     }
> >>
> >> +    if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
> >> +        error_report("qemu_rbd_open: rbd_lock_acquire");
> >> +        r = rbd_lock_acquire(s->image, RBD_LOCK_MODE_EXCLUSIVE);
> >
> > This introduces a dependency on exclusive-lock image feature.  For an image
> > without it, this call would always fail with EINVAL.
> >
> > Going beyond that, I'm not sure this patch is needed.  --exclusive mapping
> > option of krbd and rbd-nbd (which is what is essentially being replicated 
> > here)
> > is off by default and hasn't seen much interest among users, at least as far
> > as I'm aware.
> >
> > Thanks,
> >
> >                Ilya
>
> I am at least would urgently need it. I was not aware that the exclusive lock 
> feature does not guarantee that the image is not mounted twice. There is an 
> old thread from Jason wo told that it’s not what one would expect and that he 
> hopes it gets implemented as soon as the appropriate hooks are there (i think 
> this was back in 2016 or so)
>
> The reason is that we try to auto recover from hardware failures. If our 
> system is not able to detect this properly for whatever reason and recovers 
> in error that would cause data corruption.
>
> So I would like that feature and would vote for off by default. If someone 
> (like me) enables it it’s okay if the locking fails (with an appropriate 
> error message of course).

A new "exclusive" option (not a very good name for what it actually
does but I guess we are stuck with it), off by default, is fine with
me -- it would be consistent with the rest of the RBD ecosystem.

I have to admit to not understanding qemu_rbd_co_invalidate_cache
change and also consideration 3).  It would be great if you could
elaborate on those here or in the repost.

Thanks,

                Ilya



reply via email to

[Prev in Thread] Current Thread [Next in Thread]