[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup
From: |
Shahar Frank |
Subject: |
RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup |
Date: |
Wed, 3 Sep 2008 06:07:26 -0700 |
> -----Original Message-----
> From: Kevin Wolf [mailto:address@hidden
> Sent: Wednesday, September 03, 2008 3:47 PM
> To: Shahar Frank
> Cc: address@hidden; Laurent Vivier
> Subject: Re: [Qemu-devel][PATCH,RFC] Zero cluster dedup
>
> >>> +
> >>> +static int is_not_zero(const uint8_t *buf, int len)
> >>> +{
> >>> + int i;
> >>> + len >>= 2;
> >>> + for(i = 0;i < len; i++) {
> >>> + if (((uint32_t *)buf)[i] != 0)
> >>> + return 1;
> >>> + }
> >>> + return 0;
> >>> +}
> >> Why negative logic? I'd prefer buf_is_zero(). Also, could probably
be
> > an
> >> inline function.
> >>
> >
> > You are completely right. I just took this function from qemu-img,
but
> > this is not a very good excuse... ;)
>
> Oh, I see. Then what about moving it into a common header file instead
> of copying?
OK.
>
> >>> + uint64_t cluster_offset = sector_num << 9, poffset;
> >>> +
> >>> + DEBUG("%s 0x%llx", bs->filename, cluster_offset);
> >>> + if (!zero_dedup || (sector_num & (s->cluster_sectors - 1)) ||
> >>> + *nb_sectors < cluster || is_not_zero(buf,
s->cluster_size))
> >>> + return 0; /* cannot/should not dedup */
> >> Why do you check the value of nb_sectors? It's never used and only
an
> >> output parameter.
> >
> > This is wrong. nb_sectors is input/output parameter.
>
> But if I'm not completely mistaken, you don't use its input value
apart
> from this check. I just double-checked this and I still don't see its
> usage as an input parameter.
I will look into that again. Maybe you are right.
>
> >>> +
> >>> + DEBUG("%s zero dedup 0x%llx", bs->filename, cluster_offset);
> >>> + *nb_sectors = cluster; /* allocate exactly one cluster
*/
> >>> +
> >>> + if (!s->zero_cluster) {
> >>> + if (!(poffset = alloc_clusters_noref(bs,
s->cluster_size))
> > ||
> >>> + write_cluster(bs, sector_num, poffset, buf, cluster)
<
> > 0)
> >>> + return 0;
> >>> + s->zero_cluster = poffset;
> >>> + }
> >> I'm not sure how one could avoid this best, but you are
accumulating
> >> zero clusters because you create a new one for each VM run.
> >
> > This is true. It is not so bad and the alternative as I can see it
is to
> > change the qcow2 to remember the zero_cluster. In this stage I think
it
> > is not justifying it. On the other hand, it would be very nice to
add an
> > qcow headers extensions support to add optional features. In fact my
> > initial implementation did it (in backwards compatible way), but I
> > intentionally left it out because it is a standalone issue.
>
> Ok. What about leaving it as it is for now and just adding a TODO
comment?
OK.
>
> >>> @@ -1315,6 +1411,13 @@ static void qcow_aio_write_cb(void *opaque,
> > int
> >> ret)
> >>> qemu_aio_release(acb);
> >>> return;
> >>> }
> >>> + acb->n = acb->nb_sectors;
> >>> + } while ((ret = qcow_dedup(bs, acb->sector_num, acb->buf,
> > &acb->n))
> >>> 0);
> >> I'm not sure this loop is the right approach. When you get a big
write
> >> request for zeros, what you are doing is to split it up into
clusters
> >> and change and write the L2 table to disk for each single cluster.
> >>
> >> This is different from the other read/write functions which try to
use
> >> as big chunks as possible to minimize the overhead.
> >
> > If I get your point, I tend to disagree. Even though there may be
cases
> > where large write will be broken due the zero dedup - I think it
will be
> > rare, and the tradeoff of saving space should be worth it.
>
> I don't think it will be that rare. After all, as you said in your
> initial comment, you are doing a cleanup of the complete disk. I'd
> expect that large requests will be quite common in such a case.
>
> Please note that I don't talk about trading off disk space against
> writes here, it's about optimizing the dedup.
>
> > For the
> > cluster granularity issue, it is meaning less for dedup (zero or
other).
> > The deduped clusters are rarely de-duped to sequential clusters and
> > anyhow a dedup write operation is just changing the meta-data of the
> > block (logical offset l2 entry + reference count of the physical
> > cluster) so no large write optimization can be done.
>
> This is still the L2 table which has to be written each time. Say we
> have a 64k write, then you could save 15 L2 table writes if qcow_dedup
> could handle zeroing multiple contiguous clusters at once. This could
> matter for performance when wiping lots of data.
You have a good point when it comes to a sequence of zero clusters (the
L2 writes can be optimized as you wrote above). I will re-implement
this.
>
> Laurent, do you have an opinion on this? You should know this stuff
> better than me as you did the according patches.
>
> Kevin
Shahar
Re: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Kevin Wolf, 2008/09/03
- RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Shahar Frank, 2008/09/03
- Re: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Kevin Wolf, 2008/09/03
- RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup,
Shahar Frank <=
- RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Laurent Vivier, 2008/09/03
- RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Shahar Frank, 2008/09/03
Re: [Qemu-devel][PATCH,RFC] Zero cluster dedup, Laurent Vivier, 2008/09/03