qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RBD images and exclusive locking


From: Peter Lieven
Subject: Re: RBD images and exclusive locking
Date: Fri, 25 Mar 2022 17:30:34 +0100


> Am 25.03.2022 um 16:29 schrieb Ilya Dryomov <idryomov@gmail.com>:
> 
> On Fri, Mar 25, 2022 at 12:35 PM Peter Lieven <pl@kamp.de> wrote:
>> 
>>> Am 24.03.22 um 15:53 schrieb Peter Lieven:
>>> Am 24.03.22 um 12:51 schrieb Peter Lieven:
>>>> Hi,
>>>> 
>>>> 
>>>> following the thread on the ceph ml about rbd and exclusive locks I was 
>>>> wondering if we should rewrite the rbd driver to
>>>> 
>>>> try to acquire an exclusive lock on open and not magically on the first 
>>>> write. This way we could directly terminate qemu
>>>> 
>>>> if the rbd image is in use like it is done with ordinary image files on a 
>>>> filesystem for some time now.
>>> 
>>> 
>>> Digging a bit into the code and testing it seems that the exclusive-lock on 
>>> rbd image is not that exclusive. We can easily start several
>>> 
>>> qemu instances accessing the same image. Watching the locks on the image, 
>>> the lock owner of the one exclusive lock
>>> 
>>> toggles between the instances. This has potentitial for severe corruption.
>>> 
>>> I am thinking not of the case where a qemu instance gets killed or host 
>>> dies. I am thinking of network issues where the disconnected
>>> 
>>> old instance of a VM suddenly kicks back in.
>>> 
>>> 
>>> However, if I manually acquire an exclusive lock on qemu_rbd_open_image all 
>>> other callers get EROFS.
>>> 
>>> So whats the difference between the exclusive lock that a write request 
>>> obtains and an exclusive lock created
>>> 
>>> by rbd_lock_acquire? From the rbd lock ls perspective they are identical.
>>> 
>>> 
>>> Best
>>> 
>>> Peter
>>> 
>> 
>> To whom it may concern this is my proof of concept code to add safe locking.
>> 
>> Tested with live migration incl. migration from an old version without the 
>> locking support.
>> 
>> rbd_lock_acquire can take the lock that has been installed on the first 
>> write.
>> 
>> 
>> Considerations:
>> 
>> 1) add switch to enable/disable locking if someone mounts the image multiple 
>> times on purpose (default on?)
>> 
>> 2) we might need to check if we already have the lock in 
>> bdrv_invalidate_cache
>> 
>> 3) I am aware of bdrv_{check,set,abort}_perm. If this are the better hooks 
>> how to use them with only a single lock?
>> 
>> 4) rbd_lock_release is automatically invoked on image close.
>> 
>> 
>> Happy weekend
>> 
>> Peter
>> 
>> 
>> ---8<---
>> 
>> 
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 240b267c27..13a597e526 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1047,6 +1047,24 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>         goto failed_open;
>>     }
>> 
>> +    if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
>> +        error_report("qemu_rbd_open: rbd_lock_acquire");
>> +        r = rbd_lock_acquire(s->image, RBD_LOCK_MODE_EXCLUSIVE);
> 
> This introduces a dependency on exclusive-lock image feature.  For an image
> without it, this call would always fail with EINVAL.
> 
> Going beyond that, I'm not sure this patch is needed.  --exclusive mapping
> option of krbd and rbd-nbd (which is what is essentially being replicated 
> here)
> is off by default and hasn't seen much interest among users, at least as far
> as I'm aware.
> 
> Thanks,
> 
>                Ilya

I am at least would urgently need it. I was not aware that the exclusive lock 
feature does not guarantee that the image is not mounted twice. There is an old 
thread from Jason wo told that it’s not what one would expect and that he hopes 
it gets implemented as soon as the appropriate hooks are there (i think this 
was back in 2016 or so)

The reason is that we try to auto recover from hardware failures. If our system 
is not able to detect this properly for whatever reason and recovers in error 
that would cause data corruption.

So I would like that feature and would vote for off by default. If someone 
(like me) enables it it’s okay if the locking fails (with an appropriate error 
message of course).

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]