[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_fil
From: |
Max Reitz |
Subject: |
Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback |
Date: |
Tue, 9 Jul 2019 11:45:09 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
On 09.07.19 10:55, Max Reitz wrote:
> On 09.07.19 05:08, Jason Dillaman wrote:
>> On Fri, Jul 5, 2019 at 6:43 AM Stefano Garzarella <address@hidden> wrote:
>>>
>>> On Fri, Jul 05, 2019 at 11:58:43AM +0200, Max Reitz wrote:
>>>> On 05.07.19 11:32, Stefano Garzarella wrote:
>>>>> This patch allows 'qemu-img info' to show the 'disk size' for
>>>>> the RBD images that have the fast-diff feature enabled.
>>>>>
>>>>> If this feature is enabled, we use the rbd_diff_iterate2() API
>>>>> to calculate the allocated size for the image.
>>>>>
>>>>> Signed-off-by: Stefano Garzarella <address@hidden>
>>>>> ---
>>>>> v3:
>>>>> - return -ENOTSUP instead of -1 when fast-diff is not available
>>>>> [John, Jason]
>>>>> v2:
>>>>> - calculate the actual usage only if the fast-diff feature is
>>>>> enabled [Jason]
>>>>> ---
>>>>> block/rbd.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 54 insertions(+)
>>>>
>>>> Well, the librbd documentation is non-existing as always, but while
>>>> googling, I at least found that libvirt has exactly the same code. So I
>>>> suppose it must be quite correct, then.
>>>>
>>>
>>> While I wrote this code I took a look at libvirt implementation and also
>>> at the "rbd" tool in the ceph repository: compute_image_disk_usage() in
>>> src/tools/rbd/action/DiskUsage.cc
>>>
>>>>> diff --git a/block/rbd.c b/block/rbd.c
>>>>> index 59757b3120..b6bed683e5 100644
>>>>> --- a/block/rbd.c
>>>>> +++ b/block/rbd.c
>>>>> @@ -1084,6 +1084,59 @@ static int64_t qemu_rbd_getlength(BlockDriverState
>>>>> *bs)
>>>>> return info.size;
>>>>> }
>>>>>
>>>>> +static int rbd_allocated_size_cb(uint64_t offset, size_t len, int exists,
>>>>> + void *arg)
>>>>> +{
>>>>> + int64_t *alloc_size = (int64_t *) arg;
>>>>> +
>>>>> + if (exists) {
>>>>> + (*alloc_size) += len;
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int64_t qemu_rbd_get_allocated_file_size(BlockDriverState *bs)
>>>>> +{
>>>>> + BDRVRBDState *s = bs->opaque;
>>>>> + uint64_t flags, features;
>>>>> + int64_t alloc_size = 0;
>>>>> + int r;
>>>>> +
>>>>> + r = rbd_get_flags(s->image, &flags);
>>>>> + if (r < 0) {
>>>>> + return r;
>>>>> + }
>>>>> +
>>>>> + r = rbd_get_features(s->image, &features);
>>>>> + if (r < 0) {
>>>>> + return r;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * We use rbd_diff_iterate2() only if the RBD image have fast-diff
>>>>> + * feature enabled. If it is disabled, rbd_diff_iterate2() could be
>>>>> + * very slow on a big image.
>>>>> + */
>>>>> + if (!(features & RBD_FEATURE_FAST_DIFF) ||
>>>>> + (flags & RBD_FLAG_FAST_DIFF_INVALID)) {
>>>>> + return -ENOTSUP;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * rbd_diff_iterate2(), if the source snapshot name is NULL, invokes
>>>>> + * the callback on all allocated regions of the image.
>>>>> + */
>>>>> + r = rbd_diff_iterate2(s->image, NULL, 0,
>>>>> + bs->total_sectors * BDRV_SECTOR_SIZE, 0, 1,
>>>>> + &rbd_allocated_size_cb, &alloc_size);
>>>>
>>>> But I have a question. This is basically block_status, right? So it
>>>> gives us information on which areas are allocated and which are not.
>>>> The result thus gives us a lower bound on the allocation size, but is it
>>>> really exactly the allocation size?
>>>>
>>>> There are two things I’m concerned about:
>>>>
>>>> 1. What about metadata?
>>>
>>> Good question, I don't think it includes the size used by metadata and I
>>> don't know if there is a way to know it. I'll check better.
>>
>> It does not include the size of metadata, the "rbd_diff_iterate2"
>> function is literally just looking for touched data blocks within the
>> RBD image.
>>
>>>>
>>>> 2. If you have multiple snapshots, this will only report the overall
>>>> allocation information, right? So say there is something like this:
>>>>
>>>> (“A” means an allocated MB, “-” is an unallocated MB)
>>>>
>>>> Snapshot 1: AAAA---
>>>> Snapshot 2: --AAAAA
>>>> Snapshot 3: -AAAA--
>>>>
>>>> I think the allocated data size is the number of As in total (13 MB).
>>>> But I suppose this API will just return 7 MB, because it looks on
>>>> everything an it sees the whole image range (7 MB) to be allocated. It
>>>> doesn’t report in how many snapshots some region is allocated.
>>
>> It should return 13 dirty data blocks (multipled by the size of the
>> data block) since when you don't provide a "from snapshot" name, it
>> will iterate from the first snapshot to the HEAD revision.
>
> Have you tested that?
>
> I‘m so skeptical because the callback function interface has no way of
> distinguishing between different layers of snapshots.
>
> And also because we have the bdrv_block_status_above() function which
> just looks strikingly similar (with the difference that it does not
> invoke a callback but just returns the next allocated range). If you
> pass base=NULL to it, it will also “interpret that as the beginning of
> time and return all allocated regions of the image” (or rather image
> chain, in our case). But it would just return 7 MB as allocated. (Even
> though it does in fact return layer information, i.e. where a given
> continuous chunk of data is allocated.)
>
> Sure, there is no good reason for why our interface should by chance be
> the same as librbd’s interface. But without having tested it, the fact
> that the callback cannot detect which layer a chunk is allocated on just
> makes me wary.
And the librbd code doesn’t alleviate my concerns.
From what I can see, it first creates a bitmap (two bits per entry) that
covers the whole image and says which objects are allocated and which
aren‘t. Through the whole chain, that is. So in the above example, the
bitmap would report everything as allocated. (Or rather “updated“ in
librbd‘s terms.)
Then it has this piece:
> uint64_t off = m_offset;
> uint64_t left = m_length;
>
> while (left > 0) {
> uint64_t period_off = off - (off % period);
>
> uint64_t read_len = min(period_off + period - off, left);
>
> // map to extents
> map<object_t,vector<ObjectExtent> > object_extents;
> Striper::file_to_extents(cct, m_image_ctx.format_string,
> &m_image_ctx.layout, off, read_len, 0,
> object_extents, 0);
>
> // get snap info for each object
> for (map<object_t,vector<ObjectExtent> >::iterator p =
> object_extents.begin();
> p != object_extents.end(); ++p)
[...]
> for (std::vector<ObjectExtent>::iterator q = p->second.begin();
> q != p->second.end(); ++q) {
> r = m_callback(off + q->offset, q->length, updated,
> m_callback_arg);
[...]
> }
[...]
> }
>> left -= read_len;
> off += read_len;
> }
So that iterates over the whole image (in our case, because m_offset is
0 and m_length is the image length), then picks out a chunk of read_len
length, converts that to objects, and then iterates over all of those
objects and all of their extents.
file_to_extents looks like it’s just an arithmetic operation. So it
doesn‘t look like that function returns extents in multiple snapshots.
It just converts a range into subranges called “objects” and “extents”
(at least that’s the way it looks to me).
So overall, this looks awfully like it simply iterates over the whole
image and then returns allocation information gathered as a top-down
view through all of the snapshots, but not for each snapshot individually.
Max
signature.asc
Description: OpenPGP digital signature
- [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Stefano Garzarella, 2019/07/05
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Max Reitz, 2019/07/05
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Stefano Garzarella, 2019/07/05
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Jason Dillaman, 2019/07/08
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Max Reitz, 2019/07/09
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback,
Max Reitz <=
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Jason Dillaman, 2019/07/09
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Stefano Garzarella, 2019/07/09
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Max Reitz, 2019/07/09
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Jason Dillaman, 2019/07/09
- Re: [Qemu-devel] [PATCH v3] block/rbd: implement .bdrv_get_allocated_file_size callback, Stefano Garzarella, 2019/07/10