[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace
From: |
Max Reitz |
Subject: |
Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() |
Date: |
Thu, 6 Feb 2020 17:44:52 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 |
On 06.02.20 16:42, Kevin Wolf wrote:
> Am 06.02.2020 um 16:19 hat Max Reitz geschrieben:
>> On 06.02.20 15:42, Kevin Wolf wrote:
>>> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
>>>> On 05.02.20 16:55, Kevin Wolf wrote:
>>>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>>>>>> Signed-off-by: Max Reitz <address@hidden>
>>>>>> ---
>>>>>> block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> 1 file changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>> index 3a824e77e3..8ee03e9baf 100644
>>>>>> --- a/block/quorum.c
>>>>>> +++ b/block/quorum.c
>>>>>> @@ -825,6 +825,67 @@ static bool
>>>>>> quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>>>>> return false;
>>>>>> }
>>>>>>
>>>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>>>>>> + BlockDriverState *to_replace)
>>>>>> +{
>>>>>> + BDRVQuorumState *s = bs->opaque;
>>>>>> + int i;
>>>>>> +
>>>>>> + for (i = 0; i < s->num_children; i++) {
>>>>>> + /*
>>>>>> + * We have no idea whether our children show the same data as
>>>>>> + * this node (@bs). It is actually highly likely that
>>>>>> + * @to_replace does not, because replacing a broken child is
>>>>>> + * one of the main use cases here.
>>>>>> + *
>>>>>> + * We do know that the new BDS will match @bs, so replacing
>>>>>> + * any of our children by it will be safe. It cannot change
>>>>>> + * the data this quorum node presents to its parents.
>>>>>> + *
>>>>>> + * However, replacing @to_replace by @bs in any of our
>>>>>> + * children's chains may change visible data somewhere in
>>>>>> + * there. We therefore cannot recurse down those chains with
>>>>>> + * bdrv_recurse_can_replace().
>>>>>> + * (More formally, bdrv_recurse_can_replace() requires that
>>>>>> + * @to_replace will be replaced by something matching the @bs
>>>>>> + * passed to it. We cannot guarantee that.)
>>>>>> + *
>>>>>> + * Thus, we can only check whether any of our immediate
>>>>>> + * children matches @to_replace.
>>>>>> + *
>>>>>> + * (In the future, we might add a function to recurse down a
>>>>>> + * chain that checks that nothing there cares about a change
>>>>>> + * in data from the respective child in question. For
>>>>>> + * example, most filters do not care when their child's data
>>>>>> + * suddenly changes, as long as their parents do not care.)
>>>>>> + */
>>>>>> + if (s->children[i].child->bs == to_replace) {
>>>>>> + Error *local_err = NULL;
>>>>>> +
>>>>>> + /*
>>>>>> + * We now have to ensure that there is no other parent
>>>>>> + * that cares about replacing this child by a node with
>>>>>> + * potentially different data.
>>>>>> + */
>>>>>> + s->children[i].to_be_replaced = true;
>>>>>> + bdrv_child_refresh_perms(bs, s->children[i].child,
>>>>>> &local_err);
>>>>>> +
>>>>>> + /* Revert permissions */
>>>>>> + s->children[i].to_be_replaced = false;
>>>>>> + bdrv_child_refresh_perms(bs, s->children[i].child,
>>>>>> &error_abort);
>>>>>
>>>>> Quite a hack. The two obvious problems are:
>>>>>
>>>>> 1. We can't guarantee that we can actually revert the permissions. I
>>>>> think we ignore failure to loosen permissions meanwhile so that at
>>>>> least the &error_abort doesn't trigger, but bs could still be in the
>>>>> wrong state afterwards.
>>>>
>>>> I thought we guaranteed that loosening permissions never fails.
>>>>
>>>> (Well, you know. It may “leak” permissions, but we’d never get an error
>>>> here so there’s nothing to handle anyway.)
>>>
>>> This is what I meant. We ignore the failure (i.e. don't return an error),
>>> but the result still isn't completely correct ("leaked" permissions).
>>>
>>>>> It would be cleaner to use check+abort instead of actually setting
>>>>> the new permission.
>>>>
>>>> Oh. Yes. Maybe. It does require more code, though, because I’d rather
>>>> not use bdrv_check_update_perm() from here as-is.
>>>
>>> I'm not saying you need to do it, just that it would be cleaner. :-)
>>
>> It would. Thanks for the suggestion, I obviously didn’t think of it.
>> (Or there’d be a comment on how this is not the best way in theory, but
>> in practice it’s good enough.) I suppose I’ll see how what I can do.
>>
>>>>> 2. As aborting the permission change makes more obvious, we're checking
>>>>> something that might not be true any more when we actually make the
>>>>> change.
>>>>
>>>> True. I tried to do it right by having a post-replace cleanup function,
>>>> but after a while that was just going nowhere, really. So I just went
>>>> with what’s patch 13 here.
>>>>
>>>> But isn’t 13 enough, actually? It check can_replace right before
>>>> replacing in a drained section. I can’t imagine the permissions to
>>>> change there.
>>>
>>> Permissions are tied to file locks, so an external process can just grab
>>> the locks in between.
>>
>> Ah, right, I didn’t think of that.
>>
>>> But if I understand correctly, all we try here is
>>> to have an additional safeguard to prevent the user from doing stupid
>>> things. So I guess not being 100% is fine as long as it's documented in
>>> the code.
>>
>> Yes. I just think it actually would be 100 % in practice, so I wondered
>> whether it would need to be documented.
>>
>> You’re right, though, it isn’t 100 %, so it should definitely be
>> documented. Maybe something like
>>
>> In theory, we would have to keep the permissions tightened until the
>> node is replaced. In practice, that would require post-replacement
>> cleanup infrastructure, which we do not have, and which would be
>> unreasonably complex to implement.
>
> Sounds good until here.
>
>> Therefore, all we can do is require
>> anyone who wants to replace one node by some potentially unrelated other
>> node (i.e., the mirror job on completion) to invoke
>> bdrv_recurse_can_replace() immediately before and thus minimize the time
>> during which some condition may arise that might forbid the swap.
>>
>> ?
>
> This second part of your suggested comment could be dropped, as far as
> I'm concerned. If anything, it's part of the contract and would belong
> in the bdrv_recurse_can_replace() documentation.
>
> However, I think I would mention why not being 100% is okay: The part
> with "additional safeguard to prevent the user from doing stupid
> things", and that it doesn't make a difference if the user runs the
> correct command.
OK.
Max
signature.asc
Description: OpenPGP digital signature