qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RBD images and exclusive locking


From: Peter Lieven
Subject: Re: RBD images and exclusive locking
Date: Fri, 25 Mar 2022 12:35:01 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Am 24.03.22 um 15:53 schrieb Peter Lieven:
> Am 24.03.22 um 12:51 schrieb Peter Lieven:
>> Hi,
>>
>>
>> following the thread on the ceph ml about rbd and exclusive locks I was 
>> wondering if we should rewrite the rbd driver to
>>
>> try to acquire an exclusive lock on open and not magically on the first 
>> write. This way we could directly terminate qemu
>>
>> if the rbd image is in use like it is done with ordinary image files on a 
>> filesystem for some time now.
>
>
> Digging a bit into the code and testing it seems that the exclusive-lock on 
> rbd image is not that exclusive. We can easily start several
>
> qemu instances accessing the same image. Watching the locks on the image, the 
> lock owner of the one exclusive lock
>
> toggles between the instances. This has potentitial for severe corruption.
>
> I am thinking not of the case where a qemu instance gets killed or host dies. 
> I am thinking of network issues where the disconnected
>
> old instance of a VM suddenly kicks back in.
>
>
> However, if I manually acquire an exclusive lock on qemu_rbd_open_image all 
> other callers get EROFS.
>
> So whats the difference between the exclusive lock that a write request 
> obtains and an exclusive lock created
>
> by rbd_lock_acquire? From the rbd lock ls perspective they are identical.
>
>
> Best
>
> Peter
>

To whom it may concern this is my proof of concept code to add safe locking.

Tested with live migration incl. migration from an old version without the 
locking support.

rbd_lock_acquire can take the lock that has been installed on the first write.


Considerations:

 1) add switch to enable/disable locking if someone mounts the image multiple 
times on purpose (default on?)

 2) we might need to check if we already have the lock in bdrv_invalidate_cache

 3) I am aware of bdrv_{check,set,abort}_perm. If this are the better hooks how 
to use them with only a single lock?

 4) rbd_lock_release is automatically invoked on image close.


Happy weekend

Peter


---8<---


diff --git a/block/rbd.c b/block/rbd.c
index 240b267c27..13a597e526 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1047,6 +1047,24 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
         goto failed_open;
     }
 
+    if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
+        error_report("qemu_rbd_open: rbd_lock_acquire");
+        r = rbd_lock_acquire(s->image, RBD_LOCK_MODE_EXCLUSIVE);
+        if (r < 0) {
+            if (r == -EROFS) {
+                error_setg(errp, "failed to acquire excl lock for %s",
+                           s->image_name);
+                error_append_hint(errp,
+                                  "Is another process using the image?\n");
+            } else {
+                error_setg_errno(errp, -r, "failed to acquire excl lock for 
%s",
+                                 s->image_name);
+            }
+            goto failed_post_open;
+        }
+        error_report("qemu_rbd_open: rbd_lock_acquire success");
+    }
+
     r = rbd_stat(s->image, &info, sizeof(info));
     if (r < 0) {
         error_setg_errno(errp, -r, "error getting image info from %s",
@@ -1266,17 +1284,50 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
     return snap_count;
 }
 
-#ifdef LIBRBD_SUPPORTS_INVALIDATE
 static void coroutine_fn qemu_rbd_co_invalidate_cache(BlockDriverState *bs,
                                                       Error **errp)
 {
     BDRVRBDState *s = bs->opaque;
-    int r = rbd_invalidate_cache(s->image);
+    int r;
+
+    if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
+        error_report("qemu_rbd_co_invalidate_cache: rbd_lock_acquire");
+        r = rbd_lock_acquire(s->image, RBD_LOCK_MODE_EXCLUSIVE);
+        if (r < 0) {
+            if (r == -EROFS) {
+                error_setg(errp, "failed to acquire excl lock for %s",
+                           s->image_name);
+                error_append_hint(errp,
+                                  "Is another process using the image?\n");
+            } else {
+                error_setg_errno(errp, -r, "failed to acquire excl lock for 
%s",
+                                 s->image_name);
+            }
+            return;
+        }
+        error_report("qemu_rbd_co_invalidate_cache: rbd_lock_acquire success");
+    }
+
+    r = rbd_invalidate_cache(s->image);
     if (r < 0) {
         error_setg_errno(errp, -r, "Failed to invalidate the cache");
     }
 }
-#endif
+
+static int qemu_rbd_inactivate(BlockDriverState *bs)
+{
+    BDRVRBDState *s = bs->opaque;
+    int r;
+    error_report("qemu_rbd_inactivate: rbd_lock_release");
+    r = rbd_lock_release(s->image);
+    if (r < 0) {
+        error_report("failed to release lock for %s (%s)", s->image_name,
+                     strerror(-r));
+    } else {
+        error_report("qemu_rbd_inactivate: rbd_lock_release success");
+    }
+    return r;
+}
 
 static QemuOptsList qemu_rbd_create_opts = {
     .name = "rbd-create-opts",
@@ -1338,11 +1389,12 @@ static BlockDriver bdrv_rbd = {
     .bdrv_co_pwrite_zeroes          = qemu_rbd_co_pwrite_zeroes,
     .bdrv_co_block_status           = qemu_rbd_co_block_status,
 
-    .bdrv_snapshot_create   = qemu_rbd_snap_create,
-    .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
-    .bdrv_snapshot_list     = qemu_rbd_snap_list,
-    .bdrv_snapshot_goto     = qemu_rbd_snap_rollback,
+    .bdrv_snapshot_create     = qemu_rbd_snap_create,
+    .bdrv_snapshot_delete     = qemu_rbd_snap_remove,
+    .bdrv_snapshot_list       = qemu_rbd_snap_list,
+    .bdrv_snapshot_goto       = qemu_rbd_snap_rollback,
     .bdrv_co_invalidate_cache = qemu_rbd_co_invalidate_cache,
+    .bdrv_inactivate          = qemu_rbd_inactivate,
 
     .strong_runtime_opts    = qemu_rbd_strong_runtime_opts,
 };

--->8---





reply via email to

[Prev in Thread] Current Thread [Next in Thread]