qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784


From: Peter Lieven
Subject: Re: [PATCH 2/2] block/rbd: workaround for ceph issue #53784
Date: Wed, 12 Jan 2022 21:37:16 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Am 12.01.22 um 20:51 schrieb Ilya Dryomov:
> On Wed, Jan 12, 2022 at 1:32 PM Peter Lieven <pl@kamp.de> wrote:
>> Am 12.01.22 um 13:22 schrieb Ilya Dryomov:
>>> On Wed, Jan 12, 2022 at 12:55 PM Peter Lieven <pl@kamp.de> wrote:
>>>> Am 12.01.22 um 10:59 schrieb Ilya Dryomov:
>>>>> On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
>>>>>> librbd had a bug until early 2022 that affected all versions of ceph that
>>>>>> supported fast-diff. This bug results in reporting of incorrect offsets
>>>>>> if the offset parameter to rbd_diff_iterate2 is not object aligned.
>>>>>> Work around this bug by rounding down the offset to object boundaries.
>>>>>>
>>>>>> Fixes: https://tracker.ceph.com/issues/53784
>>>>> I don't think the Fixes tag is appropriate here.  Linking librbd
>>>>> ticket is fine but this patch doesn't really fix anything.
>>>> Okay, I will change that to See:
>>> It's already linked in the source code, up to you if you also want to
>>> link it in the description.
>>>
>>>>>> Cc: qemu-stable@nongnu.org
>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>>> ---
>>>>>>  block/rbd.c | 17 ++++++++++++++++-
>>>>>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>>> index 5e9dc91d81..260cb9f4b4 100644
>>>>>> --- a/block/rbd.c
>>>>>> +++ b/block/rbd.c
>>>>>> @@ -1333,6 +1333,7 @@ static int coroutine_fn 
>>>>>> qemu_rbd_co_block_status(BlockDriverState *bs,
>>>>>>      int status, r;
>>>>>>      RBDDiffIterateReq req = { .offs = offset };
>>>>>>      uint64_t features, flags;
>>>>>> +    int64_t head;
>>>>>>
>>>>>>      assert(offset + bytes <= s->image_size);
>>>>>>
>>>>>> @@ -1360,6 +1361,19 @@ static int coroutine_fn 
>>>>>> qemu_rbd_co_block_status(BlockDriverState *bs,
>>>>>>          return status;
>>>>>>      }
>>>>>>
>>>>>> +    /*
>>>>>> +     * librbd had a bug until early 2022 that affected all versions of 
>>>>>> ceph that
>>>>>> +     * supported fast-diff. This bug results in reporting of incorrect 
>>>>>> offsets
>>>>>> +     * if the offset parameter to rbd_diff_iterate2 is not object 
>>>>>> aligned.
>>>>>> +     * Work around this bug by rounding down the offset to object 
>>>>>> boundaries.
>>>>>> +     *
>>>>>> +     * See: https://tracker.ceph.com/issues/53784
>>>>>> +     */
>>>>>> +    head = offset & (s->object_size - 1);
>>>>>> +    offset -= head;
>>>>>> +    req.offs -= head;
>>>>>> +    bytes += head;
>>>>> So it looks like the intention is to have more or less a permanent
>>>>> workaround since all librbd versions are affected, right?  For that,
>>>>> I think we would need to also reject custom striping patterns and
>>>>> clones.  For the above to be reliable, the image has to be standalone
>>>>> and have a default striping pattern (stripe_unit == object_size &&
>>>>> stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
>>>>> invalid.
>>>> Do you have a fealing how many users use a different striping pattern than 
>>>> default?
>>> Very few.
>>>
>>>> What about EC backed pools?
>>> In this context EC pools behave exactly the same as replicated pools.
>>>
>>>> Do you have another idea how we can detect if the librbd version is broken?
>>> No.  Initially I wanted to just fix these bugs in librbd, relying on
>>> the assumption that setups with a new QEMU should also have a fairly
>>> new librbd.  But after looking at various distros and realizing the
>>> extent of rbd_diff_iterate2() issues, I think a long-term workaround
>>> in QEMU makes sense.  A configure-time check for known good versions
>>> of librbd can be added later if someone feels like it.
>>
>> Okay, I will add a check on image open if the striping fits or
>>
>> we have a clone. Can you explain how I would best check for it?
> ... if _we_ are a clone.  Use rbd_get_parent_info(), it returns 0 if an
> image has a parent and -ENOENT if an image is standalone.  Keep in mind
> that the image can be flattened (i.e. made standalone) at any moment so
> this should be checked on each invocation.
>
> To get striping parameters, use rbd_get_stripe_unit() and
> rbd_get_stripe_count() after checking for RBD_FEATURE_STRIPINGV2
> feature.  This can be done just on image open, but since features
> are queried on each invocation anyway and very few users are going
> to have a custom striping pattern (and therefore have this feature
> enabled), I'd do it here as well.
>
> Thanks,
>
>                 Ilya

The flattening is not a problem because if we detect a clone we disable 
get_block_status

for the session. If it gets flattened later it doesn't hurt we have just no 
allocation info.

So at least these checks could be done at image open.


To be honest, all these checks and workaround stuff doesn't feel right. We try 
to fix something that is broken

from scratch and make the qemu block driver quite ugly doing so.


I would personally go and drop get_block_status support for all existing librbd 
versions and add a flag to librbd that reports that rbd_diff_iterate2 is not 
broken

and check for that and only support get_block_status if it is present.


Or if you can confirm that rbd_diff_iterate2 does at least work for a query on 
the full image support get_block_status only for read-only

opened image files (this is true for our both current use cases backing file 
and qemu-img convert with rbd source) and read and cache

the full allocation status at the first invocation of get_block_status (and of 
course drop it on invalidate_cache or bdrv_reopen).

This would also work around the nasty algorithm that tries to handle all 
callback cases.


Best,

Peter





reply via email to

[Prev in Thread] Current Thread [Next in Thread]