Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.

From:	Laurent Vivier
Subject:	Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
Date:	Mon, 11 Aug 2008 14:39:05 +0200
Hi Kevin,

BTW, I'm currently rewriting this patch...

Le lundi 11 août 2008 à 14:10 +0200, Kevin Wolf a écrit :
> Laurent Vivier schrieb:
> > Modify get_cluster_offset(), alloc_cluster_offset() and free_used_clusters()
> > to specify how many clusters we want.
> > 
> > Signed-off-by: Laurent Vivier <address@hidden>
> > ---
> >  block-qcow2.c |  212 
> > ++++++++++++++++++++++++++++++++++++++++++----------------
> >  1 file changed, 154 insertions(+), 58 deletions(-)
> > 
> > Index: qemu/block-qcow2.c
> > ===================================================================
> > --- qemu.orig/block-qcow2.c 2008-07-29 15:22:26.000000000 +0200
> > +++ qemu/block-qcow2.c      2008-07-29 15:22:28.000000000 +0200
> > @@ -575,32 +575,76 @@ static int l2_allocate(BlockDriverState 
> >      return 1;
> >  }
> >  
> > -static uint64_t get_cluster_offset(BlockDriverState *bs, uint64_t offset)
> > +static uint64_t get_cluster_offset(BlockDriverState *bs,
> > +                                   uint64_t offset, int *num)
> 
> I think you start to know what kind of comments I'll provide. So yes,
> here's another one of them: While it's intuitive what value I should
> pass for num, it's cleary not what the function will return in it. Or
> even what the function is doing at all.
> 
> This is how I understand it: The returned num is the number of
> contiguous clusters that can be read with a single read operation, i.e.
> they are all sparse, come from a backing file or are physically
> contiguous in the image file.

Yes

> Add a comment which says this and I'll be happy.

OK (I'll cut&paste the lines above ;-) )

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, l2_index, ret;
> > -    uint64_t l2_offset, *l2_table, cluster_offset;
> > +    uint64_t l2_offset, *l2_table, cluster_offset, next;
> > +    int l1_bits;
> > +    int index_in_cluster, nb_available, nb_needed;
> >  
> > -    l1_index = offset >> (s->l2_bits + s->cluster_bits);
> > +    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> > +    nb_needed = *num + index_in_cluster;
> > +
> > +    l1_bits = s->l2_bits + s->cluster_bits;
> > +
> > +    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> > +    nb_available = (nb_available >> 9) + index_in_cluster;
> 
> This could use a comment that nb_available is the remaining sectors in
> the L2 table (is it?) and that it is used in the following two
> conditions (the goto makes this non-obvious - at first, I thought that
> this value wouldn't be used at all)

nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));

is the number of bytes given by one l1 entry

nb_available = (nb_available >> 9) + index_in_cluster;

is the number of sectors from the first sector we want to the last
sector of the same l1 entry.

I do that because I don't want to manage the case where we run across a
l1 cache entry boundary.

I guess you want comments ?

> > +
> > +    cluster_offset = 0;
> > +
> > +    l1_index = offset >> l1_bits;
> >      if (l1_index >= s->l1_size)
> > -        return 0;
> > +        goto out;
> >  
> >      if (!s->l1_table[l1_index])
> > -        return 0;
> > +        goto out;
> >  
> >      ret = l2_load(bs, l1_index, &l2_table, &l2_offset);
> >      if (ret == 0)
> > -        return 0;
> > +        goto out;
> 
> ret == 0 means that loading the L2 table failed. This is a real error,
> right? Isn't return 0 the right thing to do then?

Yes... :-P

> >  
> >      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> > +    nb_available = s->cluster_sectors;
> > +    l2_index++;
> > +
> > +    if (!cluster_offset) {
> > +
> > +       /* how many empty clusters ? */
> > +
> > +       while (nb_available < nb_needed && !l2_table[l2_index]) {
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +
> > +   } else {
> >  
> > -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +       /* how many allocated clusters ? */
> > +
> > +       cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +       while (nb_available < nb_needed) {
> > +           next = be64_to_cpu(l2_table[l2_index]) & ~QCOW_OFLAG_COPIED;
> > +           if (next != cluster_offset + (nb_available << 9))
> > +               break;
> > +           l2_index++;
> > +           nb_available += s->cluster_sectors;
> > +       }
> > +   }
> > +
> > +out:
> > +    if (nb_available > nb_needed)
> > +        nb_available = nb_needed;
> > +
> > +    *num = nb_available - index_in_cluster;
> > +
> > +    return cluster_offset;
> >  }
> >  
> >  static uint64_t free_used_clusters(BlockDriverState *bs, uint64_t offset,
> >                                uint64_t **l2_table, uint64_t *l2_offset,
> > -                                   int *l2_index)
> > +                                   int *l2_index, int *nb_clusters)
> 
> You would save some ifs if you didn't allow nb_cluster to be NULL.
> Passing a local variable containing 1 should do the very same thing and
> seems to be less error prone. Otherwise, put a note here which says what
> passing NULL means.
> 

To follow your previous comments (patch 3), free_used_clusters() has
been removed from this patch...

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l1_index, ret;
> > @@ -629,21 +673,63 @@ static uint64_t free_used_clusters(Block
> >      *l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> >      cluster_offset = be64_to_cpu((*l2_table)[*l2_index]);
> >  
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > +    if (nb_clusters && *nb_clusters > s->l2_size - (*l2_index))
> > +            *nb_clusters = s->l2_size - (*l2_index);
> > +
> > +    if (!cluster_offset) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            while (i < *nb_clusters && (*l2_table)[(*l2_index) + i] == 0) {
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> > +        return 0;
> > +    }
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        if (nb_clusters) {
> > +            int i = 1;
> > +            uint64_t current;
> > +            while (i < *nb_clusters) {
> > +                current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +                if (cluster_offset + (i << s->cluster_bits) != current)
> > +                    break;
> > +                i++;
> > +            }
> > +            *nb_clusters = i;
> > +        }
> >          return cluster_offset;
> > +    }
> >  
> > -    if (cluster_offset) {
> > -        /* free the cluster */
> > -        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > -            int nb_csectors;
> > -            nb_csectors = ((cluster_offset >> s->csize_shift) &
> > -                           s->csize_mask) + 1;
> > -            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & 
> > ~511,
> > -                          nb_csectors * 512);
> > -        } else {
> > -            free_clusters(bs, cluster_offset, s->cluster_size);
> > +    /* free the cluster */
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> > +        int nb_csectors;
> > +        nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) 
> > + 1;
> > +        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> > +                      nb_csectors * 512);
> > +        if (nb_clusters)
> > +            *nb_clusters = 1;
> > +        return 0;
> > +    }
> > +
> > +    if (nb_clusters) {
> > +        int i = 1;
> > +        uint64_t current;
> > +        while (i < *nb_clusters) {
> > +            current = be64_to_cpu((*l2_table)[(*l2_index) + i]);
> > +            if (cluster_offset + (i << s->cluster_bits) != current)
> > +                break;
> > +            i++;
> >          }
> > +        *nb_clusters = i;
> > +        free_clusters(bs, cluster_offset, i << s->cluster_bits);
> > +        return 0;
> >      }
> > +
> > +    free_clusters(bs, cluster_offset, s->cluster_size);
> > +
> >      return 0;
> >  }
> >  
> > @@ -657,7 +743,8 @@ static uint64_t alloc_compressed_cluster
> >      int nb_csectors;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        NULL);
> >      if (cluster_offset & QCOW_OFLAG_COPIED)
> >          return cluster_offset & ~QCOW_OFLAG_COPIED;
> >  
> > @@ -683,63 +770,80 @@ static uint64_t alloc_compressed_cluster
> >  
> >  static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> >                                       uint64_t offset,
> > -                                     int n_start, int n_end)
> > +                                     int n_start, int n_end,
> > +                                     int *num)
> 
> The interface between get_cluster_offset and alloc_cluster_offset is
> inconsistent. In the former function, the value passed in num is used to
> determine the number of clusters to get. In the latter, num is an output
> parameter whose value isn't used. This is confusing.

Yes, I know, I have to think about this.

> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      int l2_index, ret;
> >      uint64_t l2_offset, *l2_table, cluster_offset;
> > +    int nb_available, nb_clusters, i;
> > +    uint64_t start_sect;
> >  
> > +    nb_clusters = ((n_end << 9) + s->cluster_size - 1) >>
> > +                  s->cluster_bits;
> >  
> >      cluster_offset = free_used_clusters(bs, offset,
> > -                                        &l2_table, &l2_offset, &l2_index);
> > -    if (cluster_offset & QCOW_OFLAG_COPIED)
> > -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> > +                                        &l2_table, &l2_offset, &l2_index,
> > +                                        &nb_clusters);
> > +    nb_available = nb_clusters << (s->cluster_bits - 9);
> > +    if (nb_available > n_end)
> > +        nb_available = n_end;
> > +
> > +    if (cluster_offset & QCOW_OFLAG_COPIED) {
> > +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> > +        goto out;
> > +    }
> >  
> > -    /* allocate a new cluster */
> > +    /* allocate new clusters */
> >  
> > -    cluster_offset = alloc_clusters(bs, s->cluster_size);
> > +    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> >  
> >      /* we must initialize the cluster content which won't be
> >         written */
> >  
> > -    if ((n_end - n_start) < s->cluster_sectors) {
> > -        uint64_t start_sect;
> > -
> > -        start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> > +    if (n_start) {
> >          ret = copy_sectors(bs, start_sect, cluster_offset, 0, n_start);
> >          if (ret < 0)
> >              return 0;
> > -        ret = copy_sectors(bs, start_sect,
> > -                           cluster_offset, n_end, s->cluster_sectors);
> > +    }
> > +
> > +    if (nb_available & (s->cluster_sectors - 1)) {
> > +        uint64_t end = nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> > +        ret = copy_sectors(bs, start_sect + end,
> > +                           cluster_offset + (end << 9),
> > +                           nb_available - end,
> > +                           s->cluster_sectors);
> >          if (ret < 0)
> >              return 0;
> >      }
> >  
> >      /* update L2 table */
> >  
> > -    l2_table[l2_index] = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> > +    for (i = 0; i < nb_clusters; i++)
> > +        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> > +                                             (i << s->cluster_bits)) |
> > +                                             QCOW_OFLAG_COPIED);
> > +
> >      if (bdrv_pwrite(s->hd,
> >                      l2_offset + l2_index * sizeof(uint64_t),
> >                      l2_table + l2_index,
> > -                    sizeof(uint64_t)) != sizeof(uint64_t))
> > +                    nb_clusters * sizeof(uint64_t)) !=
> > +                    nb_clusters * sizeof(uint64_t))
> >          return 0;
> >  
> > +out:
> > +    *num = nb_available - n_start;
> >      return cluster_offset;
> >  }
> >  
> >  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
> >                               int nb_sectors, int *pnum)
> >  {
> > -    BDRVQcowState *s = bs->opaque;
> > -    int index_in_cluster, n;
> >      uint64_t cluster_offset;
> >  
> > -    cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > -    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -    n = s->cluster_sectors - index_in_cluster;
> > -    if (n > nb_sectors)
> > -        n = nb_sectors;
> > -    *pnum = n;
> > +    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> > +
> >      return (cluster_offset != 0);
> >  }
> >  
> > @@ -816,11 +920,9 @@ static int qcow_read(BlockDriverState *b
> >      uint64_t cluster_offset;
> >  
> >      while (nb_sectors > 0) {
> > -        cluster_offset = get_cluster_offset(bs, sector_num << 9);
> > +        n = nb_sectors;
> > +        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          if (!cluster_offset) {
> >              if (bs->backing_hd) {
> >                  /* read from the base image */
> > @@ -862,12 +964,10 @@ static int qcow_write(BlockDriverState *
> >  
> >      while (nb_sectors > 0) {
> >          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> > -        n = s->cluster_sectors - index_in_cluster;
> > -        if (n > nb_sectors)
> > -            n = nb_sectors;
> >          cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> >                                                index_in_cluster,
> > -                                              index_in_cluster + n);
> > +                                              index_in_cluster + 
> > nb_sectors,
> > +                                              &n);
> >          if (!cluster_offset)
> >              return -1;
> >          if (s->crypt_method) {
> > @@ -940,11 +1040,9 @@ static void qcow_aio_read_cb(void *opaqu
> >      }
> >  
> >      /* prepare next AIO request */
> > -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9);
> > +    acb->n = acb->nb_sectors;
> > +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 
> > &acb->n);
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >  
> >      if (!acb->cluster_offset) {
> >          if (bs->backing_hd) {
> > @@ -1046,12 +1144,10 @@ static void qcow_aio_write_cb(void *opaq
> >      }
> >  
> >      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> > -    acb->n = s->cluster_sectors - index_in_cluster;
> > -    if (acb->n > acb->nb_sectors)
> > -        acb->n = acb->nb_sectors;
> >      cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> >                                            index_in_cluster,
> > -                                          index_in_cluster + acb->n);
> > +                                          index_in_cluster + 
> > acb->nb_sectors,
> > +                                          &acb->n);
> >      if (!cluster_offset || (cluster_offset & 511) != 0) {
> >          ret = -EIO;
> >          goto fail;
> 
> In the writing functions, you can't just assign a big n, because
> s->cluster_data will be too small when processing encrypted data. As you
> said you fixed a segfault, I think you know this one already.

yes, this is the cause of the segfault.

Regards,
Laurent
-- 
----------------- address@hidden  ------------------
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters., Kevin Wolf, 2008/08/11
- Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters., Laurent Vivier <=
Prev by Date: Re: [Xen-devel] Re: [Qemu-devel] [PATCH 0/7] merge some xen bits into qemu
Next by Date: Re: [Qemu-devel] [PATCH 1/3] [x86] Clean up vendor identification
Previous by thread: Re: [Qemu-devel] [patch 4/5][v2] Aggregate same type clusters.
Next by thread: [Qemu-devel] [PATCH 00/11] merge some xen bits into qemu
Index(es):
- Date
- Thread