qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback


From: Max Reitz
Subject: Re: [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback
Date: Thu, 14 Nov 2019 14:11:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

On 11.09.19 13:00, Max Reitz wrote:
> On 11.09.19 12:31, Kevin Wolf wrote:
>> Am 11.09.2019 um 12:00 hat Max Reitz geschrieben:
>>> On 11.09.19 10:27, Kevin Wolf wrote:
>>>> Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
>>>>> On 11.09.19 08:55, Kevin Wolf wrote:
>>>>>> Well, by default the primary child, which should cover like 90% of the
>>>>>> drivers?
>>>>>
>>>>> Hm, yes.
>>>>>
>>>>> But I still think that the drivers that do not want to count every
>>>>> single non-COW child are the exception.
>>>>
>>>> They are, but drivers that want to count more than their primary node
>>>> are exceptions, too. And I think you're more likely to remember adding
>>>> the callback when you want to have a certain feature, not when you don't
>>>> want to have it.
>>>>
>>>> I really think we're likely to forget adding the callback where we need
>>>> to disable the feature.
>>>
>>> Well, I mean, we did forget adding it for qcow2.
>>
>> I'm afraid I have to agree. So the conclusion is that we won't get it
>> right anyway?
>>
>>>> I can see two options that should address both of our views:
>>>>
>>>> 1. Just don't have a fallback at all, make the callback mandatory and
>>>>    provide implementations in block.c that can be referred to in
>>>>    BlockDriver. Not specifying the callback causes an assertion failure,
>>>>    so we'd hopefully notice it quite early (assuming that we run either
>>>>    'qemu-img info' or 'query-block' on a configuration with the block
>>>>    driver, but I think that's faily safe to assume).
>>>
>>> Hm.  Seems a bit much, but if we can’t agree on what’s a good general
>>> implementation that works for everything, this is probably the only
>>> thing that would actually keep us from forgetting to add special cases.
>>>
>>> Though I actually don’t know.  I’d probably add two globally available
>>> helpers, one that returns the sum of everything but the backing node,
>>> and one that just returns the primary node.
>>
>> Yes, I think this is the same as I meant by "provide implementations in
>> block.c".
>>
>>> Now if I were to make qcow2 use the primary node helper function, would
>>> we have remembered changing it once we added a data file?
>>>
>>> Hmm.  Maybe not, but it should be OK to just make everything use the sum
>>> helper, except the drivers that want the primary node.  That should work
>>> for all cases.  (I think that whenever a format driver suddenly gains
>>> more child nodes, we probably will want to count them.  OTOH, everything
>>> that has nodes that shouldn’t be counted probably always wants to use
>>> the primary node helper function from the start.)
>>
>> The job filter nodes have only one child currently, which should be
>> counted. We'll add other children that shouldn't be counted only later.
>>
>> But we already have an idea of what possible extensions look like, so we
>> can probably choose the right function from the start.
> 
> Yep.
> 
>>>> 2. Make the 90% solution a 100% solution: Allow drivers to have multiple
>>>>    storage children (for vmdk) and then have the fallback add up the
>>>>    primary child plus all storage children. This is what I suggested as
>>>>    the documented semantics in my initial reply to this patch (that you
>>>>    chose not to answer).
>>>
>>> I didn’t answer that because I didn’t disagree.
>>>
>>>>    Adding the size of storage children covers qcow2 and vmdk.
>>>
>>> That’s of course exactly what we’re trying to do, but the question is,
>>> how do we figure out that storage children?  Make it a per-BdrvChild
>>> attribute?  That seems rather heavy-handed, because I think we’d need it
>>> only here.
>>
>> Well, you added bdrv_storage_child().I'd argue this interface is wrong
> 
> Yes, it probably is.
> 
>> because it assumes that only one storage child exists. You just didn't
>> implement it for vmdk so that the problem didn't become apparent. It
>> would have to return a list rather than a single child. So fixing the
>> interface and then using it is what I was thinking.
>>
>> Now that you mention a per-BdrvChild attribute, however, I start to
>> wonder if the distinction between COW children, filter children, storage
>> children, metadata children, etc. isn't really what BdrvChildRole was
>> supposed to represent?
> 
> That’s a good point.
> 
>> Maybe we want to split off child_storage from child_file, though it's
>> not strictly necessary for this specific case because we want to treat
>> both metadata and storage nodes the same. But it could be useful for
>> other users of bdrv_storage_child(), if there are any.
> 
> Possible.  Maybe it turns out that at least for this series I don’t need
> bdrv_storage_child() at all.
> 
>>>>    As the job filter won't declare the target or any other involved
>>>>    nodes their storage nodes (I hope), this will do the right thing for
>>>>    them, too.
>>>>
>>>>    For quorum and blkverify both ways could be justifiable. I think they
>>>>    probably shouldn't declare their children as storage nodes. They are
>>>>    more like filters that don't have a single filtered node. So some
>>>>    kind of almost-filters.
>>>
>>> I don’t think quorum is a filter, and blkverify can only be justified to
>>> be a filter because it quits qemu when there is a mismatch.
>>>
>>> The better example is replication, but that has a clear filtered child
>>> (the primary node).
>>>
>>>
>>> So all in all I think it’s best to make the callback mandatory and add
>>> two global helper functions.  That’s simple enough and should prevent
>>> us from making mistakes by forgetting to adjust something in the
>>> future.
>>
>> Yes, that should work.
>>
>> We should probably still figure out what the relationship between the
>> child access functions and child roles is, even if we don't need it for
>> this solution. But it feels like an important part of the design.
> 
> Hm.  It feels like something that should be done before this series,
> actually.
> 
> So I think we should add at least a child role per child access function
> so that they match?  And then maybe in bdrv_attach_child() assert that a
> BDS never has more than one primary or filtered child (a filtered child
> acts as a primary child, too), or more than one COW child.  (And that
> these are always in bs->file or bs->backing so the child access
> functions do work.)

I’ve been trying to make this work, but I don’t think it does.  It just
feels all wrong and I need up with things like
“child_metadata_and_data”.  The last straw was that blkverify should
have the raw file be the filtered child (because, well, it’s bs->file),
but then the format file would need to be a non-filtered child, and
those would default to BDRV_O_PROTOCOL (which we decidedly don’t want).

Anyway, I’m currently attempting to solve this differently:
BdrvChildRole isn’t suitable for the job, I think.  The name is
completely what we want, but it actually doesn’t look like something
that describes the child role to me.

Instead, I’m introducing a new BdrvChildRole enum mask that describes
how the child is going to be used: stay-at-node, cow, metadata, data, etc.

I’m going to rename the current BdrvChildRole structure to
BdrvChildParent (in want of a better name), because really most of what
it does is describe the parent, but precisely not the child.  I’m moving
.stay_as_node to the new BdrvChildRole enum.

I hope this lets me unify child_file, child_backing, and child_format
into a child_of_bds object.  The callbacks should then decide the
particularities based on the BdrvChildRole enum.

Hope that makes sense. (? :S)

At least I feel much happier implementing it this way, which I suppose
is a good sign.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]